I am in the midst or rereading Bob Katz's mastering Audio (2nd edition), so will reference things that are fresh in my mind from this book as we pass through
The discussion on golden ears makes perfect sense - any exceptional individual in any field of endeavour is a combination of training, practice and some natural ability/predisposition within that field
Come to 1/3 down the page and have found what I consider to be a disingenuous statement: that in playing back 24/192 files we will be playing back the whole spectrum (20Hz-95kHz more or less) that such a file can represent
This is disingenuous because
- we do not have transducers that can playback anything above 25kHz within the typical home/professional listening arena
- any playback hardware/software would ordinarily incorporate a low pass filter set to the upper boundary of human hearing (i.e. the LPF would have a corner frequency of ca. 22kHz anyway)
- the tests provided are also disingenuous as they include shifting entire tracks by 24kHz which has the potential to introduce distortions into the track before playback within undertandably incomprehensible frequency ranges
Whilst it is true that a sampling rate of 192kHz is intended to allow frequencies up to ca. 96kHz to be sampled if there is no appreciable content above 20kHz (there may be some higher order harmonics present in some sounds, but they would be negligible in terms of meaningfulness at playback) - most of the equipment in a domestic playback chain is not capable of reproducing some of the frequencies supposedly captured by 192kHz and would treat any content as invisible (rather than fold it down against some non-existent lower Nyquist frequency)\\, especially if the lpf is doing it's job at 22kHz
the other arguments about bit depth are relevant: at playback, after finalisation of the project, anything more than 16 bits is wasted
- in most cases the ad/da section of your interface is only going to give you 20 bits or less, so it will be down-sampling your bit depth
- Bob Katz (amongst many other engineers) recommends that for production (anything prior to finalisation) using 24 bits is imperative to ensure that no rounding errors are introduced
- He also recommends that you dither only once, when committing the final finished project after mastering
on Dynamic range
- Katz also points out that 16 bit audio can have a much greater perceptible range than the theoretical range of 96dB, pretty much showing the same response as this article
- most daws, whilst nominally being at 24 bits project bit depth, effectively use double precision to do their processing (48 bits is typical inside protools) but continually convert between 24 bit and 48 bits at the interface to plug-ins
- the above is not be confused with using 32 bit floating point file format; different beast, different purpose
The paper from the Boston Audio Society is a reasonable attack on the assertions made by high-def audio pundits and only confirms that during playback of the completed project we do not need more than 16/44.1
The second paper is more controversial, in my opinion, as the target of the investigation was to determine if ultrasonics could be detected - i.e. they were not applying a LPF to the signal before playback. Whet they found was that the inter-modulation distortions of trying to reproduce the higher order harmonics (11th,13th, 15th, 17 and 19th of a 2 kHz tone i.e. 22kHz through 38kHz) on a single speaker introduced perceptible even harmonics (2nd, 4th, 6th, 8th, 10th, 12th, 14th and 16th) between each odd harmonic, making it possible to distinguish between stimuli presented on a single speaker vs the imperceptibility of the stimuli being presented through independent speakers for each of the target harmonics. i.e. unless there were non-linearities introduced during reproduction the signals could not be detected reliably by the test subjects. For me, the bigger problem is that with the small number of participants involved, there is no validity to any statistical analysis applied to the results. The choice of a 79.4% point of correct stimulus is chosen to be the point where the experimenters believe that the response are due to more than chance (guessing)
And finally we get to your nugget above - discrimination at 0.1dB or 0.2dB to avoid inconsistencies in loudness issues
- the numbers quoted are not for general production but for the calibration of equipment used for testing within an experimental research center (or mastering within a production facility).
That is to say, they are not mentioned to challenge the mix engineer, but to remind us of the differences between purposes equipment may be used for and the degree to which equipment used for experimental presentation should be calibrated to ensure consistency of presentation of stimuli
- i.e. the article is not about what to do with balancing mixes but to do with how we compare and therefore how we perceive differences when making comparisons
- if we compare apples to oranges (or worse apples to cheese) then we will not get the expected results we would get from comparing apples to other apples