Please reference the meta-research, that concludes this. Or a size of such researches, that are large enough to be safe and sound.
The minimum sample size for representative research is 100 to 300 samples per group (experimental and control group), both groups must be of the same size. According to a quick AI lookup (Microsoft Copilot) these people should be between 18 and 30 years of age. So, a reliable test would need to have minimum 200 to 600 sampled, ideally in different racial and sexual groupings. Before the test even could start, each individual would need their hearing to be tested.
I am not against repeatable results, that are scientifically proven. However, I have problems with the quick "Did you double blind A/B test?" bullet, that gets shot around here so quickly, since it should be clear to everyone, who chooses the scientific approach, that this approach is expensive in both money and effort and thus impossible to achieve by us forum users (maybe somebody lurks around, who can/did do that, but I guess you all get what I mean).