Blind faith

Blind listening testing of stringed instruments

(Published in The Strad, February 2007)

What's in a label? Would a Strad sound as sweet by any other name? Blind tastings, popular in the wine world, offer a method of objective evaluation, but the string world doesn't believe in such tests. Alan Coggins wonders why.

The appreciation of fine violins is often likened to that of great wines. Both have the ability to stimulate and please our senses. Experts, connoisseurs and enthusiasts enjoy sampling and discussing their respective merits - a great Bordeaux vintage is treated with as much reverence as a golden-period Stradivari. The common wisdom is that both wines and violins improve with age. As John Townsend Trowbridge wrote

With years a richer life begins,
The spirit mellows:
Ripe age gives tone to violins,
Wine, and good fellows.

There is even a national superiority associated with each product; so that the ultimate pleasure apparently comes with sipping a French wine while listening to the silky tones of an Italian violin.

However, there is one area where the comparison between violins and wines breaks down. Centuries ago, the wine industry realised that perception and objectivity could easily be influenced by expectation. The slightest glimpse of a label, even a cork or capsule, would be enough to unconsciously bias judgement. In order to impartially compare and rank their products the wine makers developed a strict methodology based on the need for complete anonymity: blind tastings. The violin world has yet to fully embrace this concept.

Of course, the big danger with blind trials is that they can occasionally produce embarrassing results. One well-known example occurred at the 1976 Paris Wine Tasting. In the preceding years, California had begun producing some impressive wines that were achieving favourable scores in local wine tastings, although the results were widely discredited. Suggestions of biased judges were made, or it was claimed that the competing French wines must have suffered in their journey across the ocean.

Then, in 1976 a wine merchant organised a tasting in Paris as a marketing exercise to promote US wines. This time the French had the home-ground advantage and it was the Americans who had to send their wines overseas. Nine French wine experts acted as judges in a blind tasting. The first shock came when the best white was found to be Californian (in fact they took three of the top four places). But a greater humiliation was to follow: the premier red was also from California's Napa Valley; it outscored the likes of Château Mouton-Rothschild, Château Haut-Brion, Château Montrose, and Château Léoville-Las Cases. The results stunned the wine community.

Although similar blind listening tests of violins and cellos are carried out with some regularity, their progress invariably follows a well-trodden and predictable course. The trial compares new against old, ideally including some famous and highly priced classical instruments (the inclusion of a Strad will usually mean mainstream media coverage). The results show that new instruments stand up very well and often outscore their older, more expensive counterparts. The test is then discredited and dismissed as meaningless by the experts.

A typical example is the recent trial in Sweden, which was reported and discussed in The Strad (News, June 2006 and Letters, July 2006). In this case, violins made by three modern Swedish makers were compared to a Stradivari, a Gagliano and a Guadagnini. All six instruments were played by two professional players and the sound judged and scored by an audience mostly comprising members of the European String Teacher's Association. A modern violin by Peter Westerlund obtained the highest score.

A common criticism of such tests is that they are unscientific and rely on flawed methodology - most notably, they are rarely conducted in the widely preferred double-blind format. Double-blind testing means that neither the subjects (the panel of judges) nor the person carrying out the experiment (in this case the player) knows the identity of the violins that are being evaluated. The Swedish trial was only single-blind - the performers knew what they were playing and may have introduced some sort of unconscious bias.

One trial that was double-blind was organised by Robert Cauer at the Fourth American Cello Congress in 1990. This time an audience of about 140 musicians judged the sound of 12 cellos: six new and six old (a Gagliano, two Gofrillers, a Montagnana, a Stradivari and a Tecchler). The player was blindfolded and a linen screen used to hide the cellos from the audience. Instruments were only identified as new or old and the top-scoring cello was found to be old with the second, third, fourth and fifth places going to new cellos. As a group, the modern cellos earned higher scores than the older ones.

But there is another objection that can be raised about both of these tests: the quality of the judging panel. Organisers of wine shows don't select their judges from the front bar of the local pub, simply because they look like they might enjoy a drink. Similarly, asking the average music lover or even player to take part in a complex listening test could be seen as equally meaningless. To quote pianist James Boyk, any such test might just be a case of 'the double-blind leading the double-deaf'.

And there can be no doubt that judging sound is a very difficult and complex task. According to violin maker Joseph Curtin: "The simple truth is that it's difficult to evaluate violin sound even in the best of circumstances. I think this is in large part because we are not trained at it. Some people do this sort of thing naturally - most of us need training. Unfortunately no one has yet developed a method for learning to hear the violin in terms of its individual tonal components - prominent resonances, the balance between different frequency regions, and so on (at least in part because we don't yet understand how they all fit together). Until some such method is developed, it will remain difficult to talk about violin sound in an objective way."

Listening trials are typically held in front of an ad hoc group - sometimes it might be an audience assembled for a concert, sometimes professional musicians, string teachers and so on. But there is never a requirement for the judges to be in any way 'proven' to have any sort of talent at recognising and evaluating sound. These tests may say a lot more about the lack of discernment in most people's hearing than the sound of the instruments themselves.

Unfortunately there is no equivalent in the music world to the Master of Wine, and we have no qualified sound judges to call upon. Even people who spend a lot of time practising and applying these skills can still find it a challenge. According to Curtin: "At the VSA Oberlin Acoustics Workshop this summer, we did a series of blind tests modelled on the ABX format used by wine-tasters. Most of us found it difficult, if not impossible, in this concert-hall setting to consistently identify instruments we had all agreed were very different tonally. A sobering experience!"

Another example of a listening test (again, single-blind) was arranged and broadcast in the mid-1970s by the BBC (see BBC radio program). In this case the quality of the judging panel was high - Charles Beare, Isaac Stern and Pinchas Zukerman. The instruments played were a Stradivari, a Guarneri del Gèsu, a Vuillaume and a Ronald Praill that was barely a year old. Two excerpts were played by Manoug Parikian: the start of the Bruch G minor Concerto and a segment of the Bach Chaconne.

Before giving their answers, the judging panel spent some time pointing out many of the deficiencies in the testing procedure. The two excerpts played were too short and limited in tonal possibilities, there was no chance to revisit each instrument for extended comparisons, the studio represented only one of many possible listening environments, and so on; all very valid comments as the test was undoubtedly far too limited. However, the good-natured panel proceeded to give out their judgements, many of which proved to be incorrect - the Praill was mistaken for both the Strad and the Guarneri (Beare and Stern did the best with two out of four correct).

Is it possible, then, to design a meaningful test that would satisfy all parties? Cambridge University researcher Jim Woodhouse thinks so and his studies on the virtual violin project are based around that premise. "There is a well-established body of scientific techniques for doing this kind of test in a systematic way, which can produce useful and repeatable results." But he cautions: "To do anything approaching real science, you need to start with the easy questions and work up gradually. Can people reliably tell any instruments apart? How gross does the difference have to be? What kind of playing is best for bringing out differences? How many repeat tests are needed before the results have any statistical significance? It makes sense to explore these questions first with instruments which are very different, to map out the ground and refine the testing method. Then you can move on to more subtle and elusive differences."

In the study, Woodhouse and his colleagues are aiming to use small incremental changes to a 'virtual violin' (a digital sample that can be modified in a controlled way). He is hoping that psycho-acoustical testing will be able to indicate the threshold for detection of any such changes and provide a basis for evaluating quality judgements made by listeners.

For some people, though, the feeling is that the denigrators of listening tests might be too intent on finding problems and perhaps they 'doth protest too much'. David Burgess is a successful new maker (and former restorer) who would like to see these trials given more weight. He says: "I won't attempt to argue that there are no differences in sound between classic Italian and some modern instruments, or between any two instruments for that matter. My opinion and experience, though, is that even musically educated audiences listening in double-blind tests are repeatedly unable to conclude that old Italians are superior. One can always find some reason to invalidate any test, but at some point it seems like the preponderance of evidence might prevail."

Perhaps the real answer, though, lies not so much in the actual sound that is produced, but more with some intangible interaction between the player and the instrument. When asked on the BBC programme why great players seek out the top Cremonese instruments both Stern and Zukerman answered one word in unison: 'security'. Charles Beare added: "The difference between a great instrument and a good instrument is what it does for the player." Woodhouse agrees and suggests that any rigorous testing procedure should also include input from the player: "There is no doubt that a blindfolded player is much better at recognising instruments than most listeners."

But there are also some players who feel quite comfortable playing on a new instrument. Christian Tetzlaff uses a modern violin by Stefan-Peter Greiner and said in an interview in The Strad (July 2005): "If I were to play a Strad and a Guarneri in a double-blind test with my Greiner, I am sure that no one could tell which was the new instrument. When I play with orchestras, if they don't know what I am playing, they always ask if it's a Strad or Guarneri."

As a group, new instrument makers are not noted iconoclasts - they still revere the Cremonese makers and aspire to a Cremonese sound. In fact, Peter Westerlund seemed more amazed by the Strad coming last in the Swedish test than in having his own violin finish first. Talk to violin makers and they will invariably tell you about specific old instruments they have heard that proved to be both a revelation and an ongoing inspiration in their approach to the craft.

But that doesn't mean they automatically accept that all old Cremonese instruments are universally wonderful. Burgess sums it up: "My opinion from everything I've heard, played, and the musicians I've talked to? I won’t go so far as to say that 'the Emperor isn't wearing any clothes', but I do think there might be a fat man in a Speedo in this parade."

Evaluating and judging sound need not be confined to these divisive 'new versus old' debates. There are many other interesting and useful applications that would benefit from a more rigorous and controlled testing procedure - tone judging in violin making competitions, for example, or helping players make more informed choices about their next instrument purchase. With more carefully designed trials and better-trained ears it should be possible to come up with more meaningful results.

That is not to say, though, that the findings will ever be infallible. It is probably worth keeping in mind the words of Steven Spurrier, organiser of the 1976 Paris Wine Tasting, who wrote after the event: "The results of a blind tasting cannot be predicted and will not even be reproduced the next day by the same panel tasting the same wines."