Monday, 26 November 2012

Overrated wines


I have just had a glass of a very nice chardonnay from Limoux in southern France, and in order to convey to you how good it was, I’m letting you know that I give it a score of 59! Fantastic! Oh, by the way, that score is on my 46-point scale that ranges from a minimum of 12 to 66, the maximum.

It’s just wrong, isn’t it? But why? When a newspaper or magazine wine columnist awards a wine a score of 95 points, we tend to assume that it is a great wine. We all interpret this in much the same way. That is, it is a wine that has been rated in the top 5% of all wines; a wine that leaves behind all those in the 70s, 80s, and early 90s. But when was the last time you saw a wine with a rating of even 80, much less one with a 75 score? Perhaps these wines never make the guides, magazine or newspaper columns, or blogs. Alternatively, maybe these wines don’t exist. In a system such as this, popularized by American wine critic Robert Parker, there is a dramatic change in meaning as one moves below the 90s. While a rating of 90 means a perfectly acceptable wine, a rating below 80 effectively means vinegar.


And? Well, it is important to recall that this system is supposed to be based on 100 points, that is, percentages. Think about marks at school or university. How would it feel if you scored 75 out of 100 in an exam and were considered a failure as a result? Your sense of outrage would be fuelled by the understandable belief that coming in the top quarter of potential marks meant that you were much better than those who must have scored below you.


Considering for a moment what scales supposed to do, it is obvious that they provide an idea of the magnitude of quantities: distance, weight, height, temperature. By and large, we never have problems with these types of scales, a function perhaps of their longevity as useful tools and the fact that the quantities they measure are perceived to be properties of the ‘real, objective world’. It is when we start to measure the intangible, the subjective – thoughts, attitudes, feelings, perceptions – that things become a little trickier.


Why give things a number at all? Why can’t I just tell you that my glass of chardonnay was “bloody good”? Does a Robert Parker type score really tell you more than this? Of course it does, you might argue, because it uses 100 points and that gives far more discrimination between wines than a few categories. That’s true – or should be – but only if the scoring system uses the whole range of the scale. In other words, when rating wines, it is important to have wines that score 32, or 59, or 66. The use of the whole range gives meaning to the differences between scores. In simple terms, if the whole scale is used, you can be confident that a difference of one point – wherever it is on the scale - between two different wines means a 1% difference. If I never use a score below 70, however, then that means that the range of 70 to 100, or 30 points, defines the length of the scale, if it is the case that 70 means terrible and 100 means as good as wine gets (but see below). Now, a one-point difference between these two wines becomes a little more than a 3% difference.


Parker actually explains his scale by categorizing the score into deciles (groups of 10), except for the 90s, which are grouped as 90-95 and 96-100. These categories start at 50-59, which as a group are deemed ‘unacceptable’ wines. It is evident therefore that Parker’s scale is even potentially only a 50 point scale or, if we exclude unacceptable wines, then a 40 point scale. Also, consider if we then are putting wines into categories (either of 10 or 5 points), then perhaps it is a 5 or 6 (if we include unacceptable wines) category scale. But it also seems to be that we can rate within categories. So, any “average wine with little distinction except that it is soundly made. In short a straightforward, innocuous wine” (that is, 70-79) can be rated perhaps as more (e.g., a 71) or less (e.g., a 78) innocuous.


This is not to say that a 100-point scale, even if used as such, is perfect. Imagine a scenario in which you tasted some great wines – for example, Burgundies from a recent good year such as 2002. Fantastic La Tâche – 96! Fabulous Romanée-Conti – 97! Superb Grands Échezeaux – 98! But then I have slipped in an older wine from one of the great vintages of the last century. And it is Magnificent – easily 4 points higher than the Grands Échezeaux. Oops. This is known as a ceiling effect, and at least part of the problem is due to the scale being so compressed. If all really good wines have to be given a score somewhere in the 90s (because apparently that’s where they all live), then when comparing great wines, I am reduced suddenly to a 10 point scale! And, unlike the group Spinal Tap with their amplifiers (see: http://www.youtube.com/watch?v=XuzpsO4ErOQ), I cannot really just add a few points to my scale if something nicer comes along. To be fair, this is a wider problem than just wine rating (imagine rating different samples of Swiss chocolate), but such ceiling effects are reduced appreciably by using the entire scale, irrespective of what is being rated.

Part of the problem with wine ratings is that is that it is commonly believed that the numbers have an independent meaning – that is, they signify something about a wine, independent of other wines. Such beliefs are a carry over from a style of judging based on identifying defects often seen in past years in wine judging and still seen today with dairy judging. In such quality control type judging, a high number often means a product that is relatively free of defects. But this is not how scales of this type work. It isn’t even the way Parker-type scales are applied, as wines with obvious defects simply do not undergo the rating process (although the Wine Spectator magazine’s category from 75-79 is defined as including wines that are drinkable, yet still have some minor flaws).

Another aspect of wine ratings as they are commonly practiced is the view that wines can exist not only without defects, but also as perfect examples of their type. In other words, the practice of wine ratings clings to the idea of a Platonic, objective ideal of a perfect wine. If you are a wine judge, you may have even encountered wines that live in the highest 90s. But for the rest of us, we may not know a 99 if we drank a magnum of it …… even if we did think that it was very good.


A major reason why we want to assign number to things is that it allows comparison. In science, it frequently allows very rigorous comparison via the use of statistical analysis. But we can only do this if we know that our scale has certain properties. In rating food likes or sensory properties, scales like the wine 100-point scale are often used and the resultant data can be used to statistically compare products with one another. We can seldom talk about these properties in the same way we talk about weight, for example. It is difficult to make statements that one product is liked twice as much as another using the scales that are commonly used. However, we can usually talk about relative degrees of difference: so, a difference between 70 and 90 on a scale ought to mean the same as the difference between 50 and 70. Without using the whole scale however, it is not certain that such judgments could be made.


It is relatively common to compare foods from different manufacturers, if they are of the same type. We can compare Swiss milk chocolate with those from other countries, for example. But how do we compare a French pinot noir with one from New Zealand or the USA unless they are all made to be like one another? With wines, there are difficulties comparing on an equivalent basis, or like with like: different methods of growth, production, climate, season, intention, aging and so on. Which particular combination produces the Platonic ideal wine?

Preference, and to some extent quality (if we eliminate defects), is subjective. When a wine columnist gives a score of 96, they are valuing certain aspects of the wine in question. We might all agree that high acidity makes a wine tough to drink now, but what about astringency? A dry, puckering sensation might be a characteristic of some high quality red wines, but there are other wines that are judged as high in quality that are much softer in the mouth. You might trust Parker to tell you whether a wine was unbalanced or excessively astringent or that it will age well. You might even let him tell you that there were peach notes, and chocolate aromas, or that a wine was too redolent of “green bananas still on the tree”, but surely it is up to you as a wine drinker to decide whether it is a style that suits you.

I recently saw for the first time a wine columnist give two scores – one out of 100, which he labeled “empiric” (sic), and another out of 10 for ‘subjective’ assessment, by which he presumably meant how much he liked it. And this belated acknowledgement that perhaps describing or rating a wine ought to have something to do with preference highlights a major issue with wine ratings. Do you like the same wines as Robert Parker? The high impact of such an influential critic carries with it the implication that you ought.


At heart, wine ratings are based on the idea that a perfect wine can be achieved, and that its perfection is independent of what wine consumers think. At least part of the unnecessary complexity and inconsistency of such ratings, as well as their absence of scientific rigor, comes from this notion. This seems odd given that wines are drunk to give pleasure and that may vary from person to person. On one level, I can therefore make a case that a $10 chardonnay can be just a good as a $100 chardonnay if your palate says that it is.



Monday, 22 October 2012

Learning to want


The sensory properties of foods – their tastes, odours, textures – are crucial to determining what we eat.  This is because these qualities, together and apart, evoke pleasure. So, when we talk about motivations to consume foods, it is often taken for granted that food acceptability and preferences underlie our behaviours. Of course, this is not to ignore a variety of other motivations – nutrition, convenience, and so on – but foods that are not liked are generally not eaten. And if we find a food is especially palatable, we will eat more of it.


Consider though if you were very hungry and your food choices were limited. A plate of something that we would otherwise regard as unpalatable might still be gratefully eaten if that was all that was available. Our motivation here is driven not by liking, but by wanting.


Although this distinction between liking and wanting was first seen in the drug addiction literature, it is increasingly seen as important in helping to explain motivations to consume foods. On the majority of eating occasions, we want what we like, and vice versa. One major reason for this is that foods that are high in energy, either from fats or carbohydrates, are those foods that are both highly liked and stimulate wanting.


It is possible to distinguish between liking and wanting, and in some studies this has been done by contrasting ratings of liking for a food with ratings of desire to eat it. One other way is to observe facial expressions. A study by Julie Mennella [1] in which infants were fed a novel food, green beans, demonstrated how infants’ facial expressions clearly indicated dislike for this food. Following repeated exposure to eating the beans, the infants were willing to eat increasing amounts of the beans – a clear sign of wanting. However, their facial expressions did not change with repeated exposure to the beans – unless the infant had been fed peaches after the beans, in which case the facial expressions were much more positive. Pairing the sweet peaches had conditioned a liking for the beans, resulting in a changed facial expression.


This finding, and the dissociation between indicators of liking (facial expressions) and wanting (consumption), can be understood in terms of the everyday processes of flavor-flavour learning and flavor-calorie learning. As shown in humans by Yeomans [2] ingestion of energy or other wanted nutrients, especially while hungry, conditions a liking for a food flavor. In addition, however, experiencing the conditioned food odour/flavor can also elicit increased appetite and consumption. This contrasts with the pairing of an odour/flavour with just a sweet taste (which may or may not be associated with calories), which only reliably conditions liking for that odour. While, as mentioned above, we seldom want to eat what we do not find palatable, it is highly likely that it is not this preference that pushes us to eat, but rather the engagement of wanting. In studies by Yeomans and others, odours have been conditioned through pairing with nutrients. In everyday situations, not only odours but also sights, sounds and contexts can become associated with foods.


The relevance of conditioned wanting is evident when we try to understand why we eat particular foods at particular times. Most of our eating is not done because we are severely depleted of energy or other nutrients. It is done in response to a particular amount of time having passed, or the presence of cues that remind us of food. If your stomach rumbles when you enter a kitchen where something delicious is being cooked, or when passing a bakery from which the aroma of fresh bread wafts, it is a signal that your gastric juices have been conditioned to the food odours by prior pairing of those odours with the calories that followed them.


One very plausible reason why people in affluent societies are nowadays eating so much is that our worlds are filled with a multitude of such cues to wanting that occur without our being consciously aware of them: odours, flavours, sights, sounds associated with eating.  There is a good reason, for example, why television advertising of snack foods and confectionary is effective, and this is because highly realistic cues for foods can elicit wanting. In a study of these effects, Ferriday & Brunstrom [3] showed that exposure to the sight and smell of pizza in a laboratory setting increased consumption of freely available pizza after participants had already consumed a fixed amount.


Another construct – hedonic hunger – has also recently been discussed as a major motivation for eating. Hedonic hunger is seen as a drive towards food pleasure-seeking that coexists with hunger driven by energy needs. It is a very similar idea to both food craving and conditioned wanting in that it is elicited by food sensory cues. It is, by definition, satisfied only by highly preferred foods. Of course, as noted above these are the foods that are most of the time both liked and wanted.


The idea of hedonic hunger appears to be useful in helping to explain the drive to consume highly palatable foods when we are trying to eat a ‘healthy’ diet or one that leads to weight-loss. Dietary restriction reduces both energy intake and food pleasure, and so if we are genuinely motivated by pleasure-seeking in our eating, then this helps to explain why diets so often fail.


On the face of it, this problem ought to be addressed by good tasting, low calorie foods, and of course the food industry is working hard to provide these. However, widespread use of low calorie foods may be a problem in itself. Because wanting is driven by conditioned associations between energy and flavours, we may find that we start to selectively want only high calorie versions of foods. Just such a finding was suggested recently by O’Sullivan [4] who showed that a low calorie, but familiar, version of a pasta dish became less and less liked relative to the regular version over repeated eating occasions. Whether this would have led to reduced amount consumed or desire to consume was not measured. Another inadvertent consequence of proliferation of low-calorie foods may be a reduced ability to estimate energy intake. At present, sensory properties provide important information about the calories in what we consume. So, thick, sweet, rich foods tend to be higher in calories; these same, palatable qualities uncoupled from their calorie consequences may limit our ability to implicitly monitor our energy intake. Since most of us – but especially those trying to restrict intake – rely heavily on such cues, we may be losing an important part of our ability to monitor what we eat and, for example, compensate for high energy intake at one meal a with lower intake at another.

________________________________________________________________




Thursday, 27 September 2012

The highly discriminating consumer



In the sensory evaluation of foods or drinks, good practice dictates that the type of question being asked will determine both the method of evaluation and the sort of panelist that is required. To analyze the different odours, tastes and textures within a complex food requires considerable training in vocabulary and use of rating scales. One by-product of such analytical training is that the individuals become very sensitive to variations in the intensity of product attributes and, by extension, to differences between samples on these attributes. In effect, trained panelists are good at seeing the signal amid the noise of lots of other product qualities.


In contrast, untrained consumers are highly variable in their use of sensory terms and their assessment of attribute intensities. It is not that you can’t ask consumers about product attributes, but rather that you’ll end up with a highly variable set of numbers if you do. An implicit assumption has been that because of such variability, consumers are unlikely to be sensitive to variations between similar products or versions of products, if those differences are quite subtle.


But what if discrimination isn’t only about perceptual sensitivity? Students of Signal Detection Theory (and who isn’t?) will know that decisions are based not only on perceptual sensitivity but also on one’s criterion – essentially, one’s willingness to report something as being present or not. This is often referred to as response bias, and signal detection and discrimination methods measure it independently of sensitivity.

We might want to consider also the reasons why we might want to discriminate between different products or samples. A few years ago, my flavourist colleague Leslie Norris and I were sitting in her kitchen discussing whether or not any improvement could be made in the way that wineries tested for the presence of cork taint in batches of wine. At that time, in order to reduce the amount of tainted wine that reached the consumer, highly trained panels were used because these individuals could be made very sensitive to the tainting compound TCA (trichloranisole) through their training. But then we had a shared “aha” moment. Trained panels were being used to ask a question that was primarily of relevance to consumers. What if these panels were too sensitive, potentially rejecting batches of wine that consumers would find perfectly acceptable? The outcome of this was a method – which we termed the consumer rejection threshold – that specifically used consumer preference responses [1]. In this case, consumers were, by definition, sensitive enough, but not too sensitive, to achieve the aim.


More fundamentally, perhaps, recent research has suggested that consumers might in fact be highly discriminating, at least partly as a function of not being trained. Analytical panelists become skilled at ignoring their emotional responses to the samples that they evaluate, with the (probably correct) assumption that analytical and hedonic approaches are necessarily antagonistic. If discrimination were purely a perceptual process then we might argue that removing emotion from the equation is appropriate.

In fact, a case can be made that ignoring emotional responses actually impairs discriminative ability. Several studies over the past decade suggest that engaging and utilizing emotions may be critical to discrimination. An evaluation procedure known as the authenticity test has sometimes been shown to be superior to analytical methods at detecting product differences. The essence of this test is that the emotions of regular consumers of a product are manipulated by exposing them to a story that has a strong negative implication for their product. For example, in one study, regular consumers of Danish milk were told that foreign milk imports might be introduced onto the market [2]. Instead of being asked to find the different sample or the sample with the highest level of some property, the consumers are asked to pick the authentic (that is, Danish) sample. In this case, the milk samples differed according to feed type and storage time, and the authenticity test revealed that both factors had an impact on milk flavour.


Why do emotions improve discriminability? One persuasive argument is that eliciting emotional states allows access to the otherwise “unconscious”, implicitly learned information about the product that all regular consumers possess. In other words, we are all experts in respect to the flavour and other sensory characteristics of our favourite products, even if we do not possess explicit awareness of such knowledge or a detailed sensory lexicon [3]. This explanation (also called the mood as Information hypothesis by social psychologists) proposes that negative emotions focus attention on deviations from the implicit memory of product attributes. Negative emotions are necessary because they probably act as signals that something is wrong. In effect, we monitor our own mood to infer that there is a problem. In turn, this evokes cognitive effort to search for causes of the “problem”. It becomes adaptive, therefore, to pay attention to details. In terms of the authenticity test, it is the unauthentic product that raises the alarm.


The ability of the authenticity test to induce emotional arousal and improve discrimination is also consistent with what we know from psychology and neuroscience about emotion and attention. It is clear that the brain allocates attention to stimuli as a function of their emotional significance. Thus, using emotional priming stimuli (a smiley face will do) in a detection task produces improved detection of very brief neutral stimuli that follow it. Similarly, emotional facial expressions command attention much more quickly than other sorts of stimuli, as do stimuli that have previously been paired with a reward.


Another way of considering the relationship between emotion and discrimination is to think about the later as a choice situation. Choices and preferences are intimately linked (see for example, Taste Matters website or blog, August, 2012: Choosing to like of liking to choose). Making a discrimination is of course making a choice on some basis. Why pick one thing over another? If an emotion is crucial to the decision, then the decision has motivational implications. In some cases, discrimination between alternatives allows selection that contributes to survival, such as when we choose a high-energy food over a low energy version.

Invoking a motivational explanation of discrimination leads to some interesting predictions. It suggests, for example, that we might be better at discriminating tastes or flavours when we are hungry …. but only for those substances that are likely to reduce hunger.  Cattle, sheep, horses and rabbits select and eat more of forage cut later in the day, when sugars are highest. We, too, should be better at discriminating sweet - but not bitter - tastes of different intensities before lunch.