Thoughts on A/B testing

With electronics it can be valid with A/B if you can be sure you listen to exact the same volume. For stereo speakers its impossible because they only sound great when properly installed +- 5 cm with the correct distance between them, distance to frontwall , opening angle and so on. A good constructor of loudspeakers in a stereosetup mention this in the setup manual. And yes - different speakers need very different installation positions in a listening room, so one cant just place them at the same spot when doing A/B listening.

One complicated but more valid way to compare 2 pair of loudspeakers is to optimise the setup of one pair at a time in the room , putting tape on the exact spot on the floor, and then carry out the speakers from the room when comparing to another pair of speakers ( also optimised with tape on the floor to find the best sounding spot ).

It will take some time to do comparisons this way, but the result will be more valid.
Harman have a set up in place for this. Not really practical in our own homes, though.
Can't we just use one of these? Treat it as part of the control since it does not change between tested configurations other than flipping a switch for input. Possibly need a preamp ahead of the two amps in order to pick which amp gets the output from the source, too, I guess.


-Ed
 
Peer reviewed papers are only as good as the peers who review them. I have first hand knowledge of a PhD student who wrote a thesis full of inaccuracies about work done for a company I worked for. His supervisor who I assume reviewed it seemed to have no issue with it. One of his main conclusions was that the toroidal core of one of our variable transformers could be optimised by reducing the cross sectional area to half the current size which was clearly BS but his supervisor clearly believed it 🤷‍♂️
Per review for publications done by outside experts, not by the author's supervisor. But, yes, per review is not perfect. But it does show a level of support from other experts in the field. There are a lot of scientific papers published these days without per review. They are often approached with more skepticism than per reviewed papers. Per review is an important part of the research process. But, as is always the case in science, anyone who does not agree with the conclusions of a paper is free to counter that paper with data of their own.
 
Per review for publications done by outside experts, not by the author's supervisor. But, yes, per review is not perfect. But it does show a level of support from other experts in the field. There are a lot of scientific papers published these days without per review. They are often approached with more skepticism than per reviewed papers. Per review is an important part of the research process. But, as is always the case in science, anyone who does not agree with the conclusions of a paper is free to counter that paper with data of their own.
How can you tell that a paper has been peer reviewed?
 
. It comes across as "we have scientific proof that everything matters, always". That's simply not the case.
I got a different impression from the video. What I heard was that our hearing/perception system are very sophisticated, much more sophisticated than most people realize, and these complexities need to be considered in addition to the standard measurements like frequency response and noise level. There are many people who claim that if you cannot see a difference in these standard measurements then there must not be any difference in the sound. His point is simply that is a very simplistic view of a very complex hearing/perception system. Most of the video were just examples that helped prove that point.

I thought the experiment of removing the first few microseconds of a recording and having the people who played it try to identify the instruments was particularly interesting. It showed that timing down to the microsecond was important. That is far outside the normal 20 to 20 K range that many people take as the definitive test of hearing ability. It is certainly a very unexpected result and I would love to know more about that experiment.

My take from the video is not that all things matter but that more things matter than the simple test measurements typically reported and and relied on by some people. I agree with that conclusion.
 
Last edited:
How can you tell that a paper has been peer reviewed?
The website for the journal should discuss the review process.

There are current two types of scientific publications. The old, tradition ones almost always have peer review systems and the publication of an article in those types of journals says that the paper has passed peer review. The newer types of publications are ones where you pay to have a paper published. But, those pay for publications journals typically are not peer reviewed. It is a very easy way to have a lot of publications which is good for a resume.

In this case, the Journal of Applied Acoustics is an one of those old style publications and the review process is first by one of the editors and then by a minimum of two outside experts.

From there website

"This journal follows a single anonymized review process. Your submission will initially be assessed by our editors to determine suitability for publication in this journal. If your submission is deemed suitable, it will typically be sent to a minimum of two reviewers for an independent expert assessment of the scientific quality. The decision as to whether your article is accepted or rejected will be taken by our editors."

More details on the process can be found here.

Guide for authors - Applied Acoustics - ISSN 0003-682X | ScienceDirect.com by Elsevier
 
Last edited:
I got a different impression from the video. What I heard was that our hearing/perception system are very sophisticated, much more sophisticated than most people realize, and these complexities need to be considered in addition to the standard measurements like frequency response and noise level. There are many people who claim that if you cannot see a difference in these standard measurements then there must not be any difference in the sound. His point is simply that is a very simplistic view of a very complex hearing/perception system. Most of the video were just examples that helped prove that point.
From looking at the video and the paper it seems to me the strong messages from the video do not match 100% the messages in the paper it is based on; so at least as far as I understood it, there appears to be a level of interpretation introduced by the author of the YT video.

In any case I'd suggest anyone interested watching the video not to take it for granted, but to take the time to read the paper it is based on as well. I'd also encourage people to read other academic papers on similar topics too, because not all researchers share the same mind.

I don't see myself as competent enough to comment on the details of the content of the paper itself - the scientific community will either accept and build upon / or reject / or ignore its conclusions.

What I heard was that our hearing/perception system are very sophisticated, much more sophisticated than most people realize
This part is true. I don't believe any person who did any kind of research in audio would try to negate this. But human auditory system has its limits too, and these are not quite as poorly understood as it is sometimes suggested. But not everything is understood either.

these complexities need to be considered in addition to the standard measurements like frequency response and noise level.
In my experience professionals in the audio field try to do this.

Please also note that we are able to measure much more than just frequency response and noise level. A typical measurement suite will contain quite a few additional tests, depending of course on what kind of device is being tested.

However, in my experience it quickly becomes clear to most involved in auditory perception research that some of the tests have much more weight than others, because people tend to hear certain types of deviations much more easily than others. It would be silly to ignore this.

There are many people who claim that if you cannot see a difference in these standard measurements then there must not be any difference in the sound.
People claim all sorts of things. These claims are sometimes justified, but not always. 🤷‍♂️

Most of the audio research articles I've read look quite different. To be honest, I haven't seen many (any?) serious research articles where some random device is measured and claims are made about how it sounds without listening tests.

In fact, many research articles in audio are very much focused on (controlled) listening tests. A lot of audio research in the last 100 or so years was looking at the mechanics and limits of human hearing perception, what causes audible differences, and what people prefer on average.
Results of such research are sometimes used to establish reasonable metrics/targets for device engineering. It is IMO a reasonable thing to do.

I thought the experiment of removing the first few microseconds of a recording and having the people who played it try to identify the instruments was particularly interesting. It showed that timing down to the microsecond was important. That is far outside the normal 20 to 20 K range that many people take as the definitive test of hearing ability. It is certainly a very unexpected result and I would love to know more about that experiment.
Are you aware that removing the first few microseconds of a recording (i.e. the initial transient) also changes the recording spectrum?
Note that the time-domain waveform and frequency-domain magnitude+phase response are inextricably and directly linked. If you change one, you also change the other (explained by Fourier transform in mathematics). So this may not be as unexpected as it can seem at first glance.
But it is absolutely an interesting test, and I'm sure it is valuable in better understanding human perception.

My take from the video is not that all things matter but that many more things matter than the simple test measurements typically reported and and relied on by some people. I agree with that conclusion.
People tend to oversimplify, I agree. But unfortunately people also tend to overcomplicate. Being human is filled with paradoxes. :D

Lastly, it is clear to me that it is unlikely most audio hobbyists would be willing to go through the hassle to setup a truly rigorous listening test just to see which device they prefer. That is IMHO reasonable and completely understandable. People are free to select whichever methodology they like when choosing devices to buy and own.

The part that always surprises me is not this; it is the fact that so many people in this hobby seem to think that completely uncontrolled listening tests are an equally valid (or even more valid) way to compare audio devices compared to controlled listening tests, and are willing to argument this to death - most without ever having truly researched or experienced the alternative. :confused:
 
I am not going to get into yet another long discussion about this topic. It has been beat to depth a million times in countless forums, including this one, and usually ends up in a tit for tat discussion that goes nowhere.

My points are simple :

Some people claim

1) if the published measurements (like in ASR) are at a certain level for two devices, it is impossible for a person to hear a difference. And, yes many people do take that position.

2) the ONLY way to compare two pieces of equipment is in a strictly controlled A/B test, almost always doing the comparison on the scale of minutes.

My posts and the video I linked too simply question those positions with scientific observations from reputable sources.

The video and the paper it is based on provides a lot of interesting and relevant information about this topic. Is it the definitive work for all time on this topic? Of course not. Are there things in the analysis that might be wrong? Of course. But it provides a lot of valuable information into this topic and provides a lot of evidence that the people who think that published measurements and A/B tests are the only thing relevant to what people hear from their systems.

Bye for now.
 
Some people claim

1) if the published measurements (like in ASR) are at a certain level for two devices, it is impossible for a person to hear a difference. And, yes many people do take that position.

2) the ONLY way to compare two pieces of equipment is in a strictly controlled A/B test, almost always doing the comparison on the scale of minutes.

Firstly, these are two different topics, though obviously linked.

The reason for [1] (devices measuring beyond a certain threshold will sound identical) is based on multiple tests for [2] (blind A/B testing).

That is to say, multiple subjects are played pieces of music with various levels of distortion, A/B'd with music with no distortion. Ditto noise and changes to flat frequency response. By doing this many, many times, we know the minimum level of noise, distortion and frequency variations enough for someone to hear.

So I fully subscribe to [1]. However, I am also aware that some people may claim to hear things beyond these limits. By definition, the only way they can prove this is to conduct A/B tests which demonstrate that they can.

As for Kunchur's paper, it's been heavily discussed and criticised. Just because he published it, doesn't mean he's right. He's not a specialist in the field, he's a physicist, and audio is his hobby.

All of this is really not relevant. Again, we need to come back to basics. We know the limits of human hearing. The levels of noise and distortion which we can detect are well-documented, and thoroughly researched, using the scientific method.

If you don't trust them, please don't ever climb on a plane again.

But let's not mix these two things up.

We all know about bias. If we've ever felt we can detect an audible difference between two DACs which shouldn't be there, we must have at least some concern that bias might be the reason. A/B testing is simply a way of telling whether you can really hear a difference or not.

And I remain highly suspicious of anyone who bends over backwards to think of an excuse not to do so.
 
Firstly, these are two different topics, though obviously linked.

The reason for [1] (devices measuring beyond a certain threshold will sound identical) is based on multiple tests for [2] (blind A/B testing).

That is to say, multiple subjects are played pieces of music with various levels of distortion, A/B'd with music with no distortion. Ditto noise and changes to flat frequency response. By doing this many, many times, we know the minimum level of noise, distortion and frequency variations enough for someone to hear.

So I fully subscribe to [1]. However, I am also aware that some people may claim to hear things beyond these limits. By definition, the only way they can prove this is to conduct A/B tests which demonstrate that they can.

As for Kunchur's paper, it's been heavily discussed and criticised. Just because he published it, doesn't mean he's right. He's not a specialist in the field, he's a physicist, and audio is his hobby.

All of this is really not relevant. Again, we need to come back to basics. We know the limits of human hearing. The levels of noise and distortion which we can detect are well-documented, and thoroughly researched, using the scientific method.

If you don't trust them, please don't ever climb on a plane again.

But let's not mix these two things up.

We all know about bias. If we've ever felt we can detect an audible difference between two DACs which shouldn't be there, we must have at least some concern that bias might be the reason. A/B testing is simply a way of telling whether you can really hear a difference or not.

And I remain highly suspicious of anyone who bends over backwards to think of an excuse not to do so.
I said I am done with this, but I think it is important to question some our your statements.

Kunchur is a physics who does specialized research in audio and acoustics. He is far more than a hobbyist. Hobbyist do not publish peer reviewed papers in Audio Acoustics, complete with 218 references. He has also done invited speaking engagements for the Audio Engineering Society, a professional society of audio engineers and scientist many of whom work in the audio industry. Again, he would not be invited to do that as a mere hobbyist. And, if you want to question his research, it should be done with the same scientific research approach that he used, complete with references and peer review. I suggest you read the paper before you reject him as a hobbyist. Much of the paper is pretty technical, so you might want to jump down to the conclusions and also the paragraph before the conclusions.


Secondly, the testing you describe of "noise, distortion and frequency variations" is far from a complete test of all known hearing/perception attributes of the human brain. It only test human hearing for those specific attributions, which, in and of themselves, do not completely define human hearing . The idea that "We know the limits of human hearing." is certainly questionable. As Kunchur points out (with references to other research) in his paper, their are much more subtle effects that contribute to human hearing/perception, including timing changes down to the microsecond time frame.

He also points out, with references, that long term testing and training of the brain is necessary for full comparisons of different audio setups and that short term A/B test is not adequate for full comparisons. Please read his analysis of this.

By the way, the research that the video quotes of removing the first few microseconds of a recording is quite interesting and certainly shows that there is a lot more to human hearing differences than "noise, distortion and frequency variations". I am still trying to find the referenced paper.

So, I would suggest that you get past the same old arguments and study the science behind what researchers like Kunchur and others are saying.

There - we have solved the subjectivist versus objectivist argument once and for all. That should cut done the number of posts on audio forums by at least 50% :) Well, maybe not.

And, by the way, I am also a Ph. D. physicist, but I am just an audio hobbyist.
 
Secondly, the testing you describe of "noise, distortion and frequency variations" is far from a complete test of all known hearing/perception attributes of the human brain. It only test human hearing for those specific attributions, which, in and of themselves, do not completely define human hearing .

Okay, go on then. What are the others?

Just while we're discussing that, let's say these unknown factors exist - I call them pixies.

Isn't it strange that, even no one knows what they are, high end manufacturers have been able to negate them, without knowing what it is they're trying to negate, and that whilst they really don't know what it is they're doing, they know exactly how to do it, and it costs a lot of money.

And no one has ever said/leaked what that something that they're doing is, nor discover it in a tear down of their products.

And if anyone tries to discover what this unknown, unkowable, undetectable, uncopyable, but highly expensive factor fix is, it's undetectable in A/B testing, or blind testing performed over time.

Nah. Pixies. :ROFLMAO:
 
Last edited:
Okay, go on then. What are the others?

Just while we're discussing that, let's say these unknown factors exist - I call them pixies.

Isn't it strange that, even no one knows what they are, high end manufacturers have been able to negate them, without knowing what it is they're trying to negate, and that whilst they really don't know what it is they're doing, they know exactly how to do it, and it costs a lot of money.

And no one has ever said/leaked what that something that they're doing is, nor discover it in a tear down of their products.

And if anyone tries to discover what this unknown, unkowable, undetectable, uncopyable, but highly expensive factor fix is, it's undetectable in A/B testing, or bling testing performed over time.

Nah. Pixies. :ROFLMAO:
Bling testing is even better than blind testing and describes it perfectly 🤣.
 
Bling testing is even better than blind testing and describes it perfectly 🤣.

HA! Many thanks. Corrected.

As an aside, I'm a firm believer that people who use the word 'bling' do so for two reasons.

1 - They can't spell ostentatious.
2 - They're unaware that a tendency to ostentation is a negative and unpleasant character trait.

:ROFLMAO:
 
Sometimes it’s a good thing to compare electronics , it might be unnecessary to upgrade .

When I compared the WiiM ultra + Rega dac R with a Linn klimax DS/0 during a couple of days, I thought the Linn would walk all over the Wiim combo . That was not the case - they sounded very similar ( both very good ) and the volume regulation in the WiiM was slightly better sounding at lower levels than the one in Klimax .
 
Back
Top