The Use of Numerical Scoring for Subjective Assessment:

I believe the consumer comes first. It's vital that the audio critic imagines he is a customer when evaluating a product or technology.

I generally use a numeric scale to grade the sound quality of audio electronics. Some 30 years ago I used an IEC based 0 to 10 scale where zero represented nil fidelity, 10 was assigned to essentially perfect reproduction. So, given a large and representative group of products, 5 points was defined as the average for sound quality. Maintaining this historic standard proved difficult as equipment improved, and the best product scores edged upwards towards 8 and 9. The necessary differentiation of excellence demanded finer resolution as the scale was inexorably and logarithmically crushed towards 10, that pre-chosen limit of perfection.

My solution was to remove the barrier imposed by the upper limit of 10 and allow the numeric scores to rise in proportion to the gauged linear improvement over previous references (maintained in my own equipment store for long term consistency).

This action resulted in considerable controversy. The effect of compounding carefully judged perceived improvement on a percentage basis has resulted in the very best modern amplifier scoring at the 100 point level. Thus low grade, marginal quality integrated amplifiers score as they always did in referenced listening, at the 5 to 6 point level, while worthy ‘Hi Fi’ alternatives deliver in the 12 to 18 range. Meritorious power amplifiers lie in the 18 to 30 range, while top of the line references may reach over 70 points. Achieving these highest levels requires a commensurate performance from all other elements in the listening chain and sufficient patience to optimise the performance of the test system to extract the full potential of the device under test. Archive scores are published on HIFICRITIC.COM

For loudspeakers, the performance variations are of a different order, and their analysis and judgement becomes a complex balance for each design, leavened by an allowance for individual taste and variations in both room acoustics and room matching. I have found numeric scores to be inappropriate here, though if larger scale multiple product testing were to be reintroduced, sufficient comparative data would be available for reliable scoring and the construction of an overall rank order and for different parameters of performance.

