A recent article by Diane Poulin-Dubois and colleagues at Concordia University is interesting both because it reports on a fascinating area of study (imitation in infants) and because it illustrates several common flaws in experimental psychology. The original article is here and you can read about it in this blog post.
Briefly, Poulin-Dubois et al. primed 14-month-old toddlers with the actions of either a “reliable” or an “unreliable” adult. The reliable adult would look inside a container, which the infants had previously been led to expect might contain a toy. The adult would put on a happy face as on seeing something fun, then hand over the container (in which there was, indeed, a fun toy) to the infant. In the “unreliable” condition, everything was the same except that the container did not contain a toy.
In the second part of the experiment, infants observed the same adult turning on a light switch with their forehead (as in the well-known experiments of Gergely et al., 2002). They were then encouraged to imitate the adult with the words “Now it’s your turn.” Significantly fewer infants imitated the adult model exactly (i.e., using their forehead to turn on the light rather than, more naturally, their hands) in the unreliable condition than in the reliable condition.
First of all, kudos to the authors for a very elegant experimental design, neatly combining the two paradigms of selective learning (as in Paul Harris’s work) and imitation (as in Gergely & Csibra’s work). My issue – as so often in experimental psychology – is not with the design but with the interpretation, which is wildly overblown. I initially thought the title of the blog report I just linked to (“Toddlers Won’t Bother Learning from You if You’re Daft”) might be misrepresenting the authors’ argument, only to find that they make similar claims in the original article (e.g., in both the title, “Infants Prefer to Imitate a Reliable Person”, and the discussion, ” … the same behavior performed by a previously unreliable adult is interpreted as irrational or inefficient, thus not worthy of imitating”).
There are three main flaws with this argument, all of which are common flaws in experimental psychology. First, “reliability” may be too narrow an interpretation of whatever property of the adult’s behaviour is influencing the infant’s behaviour. Put yourself in the toddler’s bootees. In one condition you have an adult who makes nice smily faces and keeps showing you a fun toy; in another, an adult who also makes nice smily faces but who keeps showing you an empty container. Which one is more fun, and therefore more worthy of attention? In order to isolate “reliability” as the relevant property, one would need two additional control conditions in which neutral faces were used. (If it’s all about reliability, an adult who makes a neutral face and shows the infant a toy should be less worthy of imitation than one who makes a neutral face and shows them an empty container. I’ll leave it for the reader to judge the plausibility of that prediction.)
To their credit, Poulin-Dubois et al. do acknowledge this possibility – and the need for follow-up studies along the lines I just mentioned – in their discussion. A second flaw is more serious. This is the over-ascription to an entire population of a property that has been demonstrated in a sub-group. (Again, this is all too common in psychology: I am guilty of it myself, in an article where I discussed the implications of children’s generic tendency to tattle on peers, even though I had observed that several children never tattled at all.) If we look at the actual data for this study, we find that 61% of children imitated the model in the reliable condition, and 34% imitated in the unreliable condition. Assuming that individual performances would be reliable across trials, this suggests that about a third of 14-month-olds do not imitate strangers, about a third do imitate strangers, and about a third are sensitive to the stranger’s “reliability” (or whatever). This is not at all what the authors are implying in the quotations I made earlier, which is that all infants are sensitive to a model’s reliability.
Fig. 2. Percentage of children who use their forehead or hand to imitate in each reliability condition. (from http://www.sciencedirect.com/science/article/pii/S0163638311000221#bib0065)
I think these two criticisms are particularly strong when put together. Really there is a whole package of differences between the two conditions. Some individuals are likely to be sensitive to some of the differences (e.g. the difference in reliability), others to other differences (e.g. whether they actually get shown a toy). So the main conclusions that we can draw from this study is that imitation will vary according to the social context, and that different individuals are (already, at 14 months) sensitive to different aspects of the social context. Reliability may be one relevant aspect of the social context, but from this study alone, it’s hard to be sure. (This is not really a direct contradiction of what the authors are saying, but semantics is important, as it shapes how we think about what we are studying.)
Actually, though, even this conclusion may be going too far, because my third criticism calls into question whether the authors have even shown a reliable difference in imitation per se. The third, and perhaps the most nefarious, common flaw in experimental psychology is to engineer an analysis that suits one’s conclusions. I didn’t notice this in the current study at first, but became troubled when I realised that they had completely excluded those individuals who did not touch the light switch at all. This might have been fine if more infants had failed to touch the switch in the unreliable condition; but in fact, 10 infants failed to touch it in the reliable condition, compared to only 3 in the unreliable condition!
This is a bit weird. “Fussy” infants (those who do not behave themselves during the experiment) had already been excluded, so I don’t think the problem here is a lack of attention paid to the model. Are we supposed to believe that a complete failure to emulate the goal of the adult (turning on the light) is irrelevant to the analysis? Given the three action possibilities of imitating exactly, emulating the goal, and completely ignoring the model, I can think of three ways of analysing the data:
(1) Define imitation as exact imitation, and compare its frequency with emulating + ignoring
(2) Define imitation as exact imitation + goal emulation, and compare their combined frequency with ignoring.
(3) (The most impartial option): Compare the frequencies of all three types of action across the two conditions.
Ignoring the ignorers is not really a sensible option, because if we reverse-engineer the frequencies of each action (they only give percentages) we get the following:
|Exact imitation||Goal emulation||Ignoring|
It would be interesting to get hold of the raw data to see which differences are statistically significant, but already it is interesting that in both conditions, exact imitation only took place in a minority of cases – not really in keeping with the authors’ message. Furthermore, although the sample size is small it looks like the higher frequency of ignoring in the “reliable” condition is comparable to the lower frequency of emulation. My suspicion is that if all three options were included in the analysis, the impact of condition would be insignifcant in the context of the overall error variance.
This does make me a little wary of the original experiments by Gergely and colleagues, and I will have a look at whether they included “ignoring” in the analysis: it seems a little arbitrary to exclude it. Another revelation for me is that imitation is actively encouraged in the child by the exhortation “Now it’s your turn.” Presumably Gergely did that too, yet his experiments are often discussed (and compared to similar experiments with chimpanzees) as if they are examples of spontaneous imitation. Children’s propensity to take part in imitation games – an activity which it is obviously harder to encourage in chimps – has quite different theoretical implications, it seems to me …