The authors of the 2009 study, Nils Jostmann (NJ), Daniël Lakens (DL), and Thomas Schubert (TS), plus the author of a replication study, Hans IJzerman (HIJ), kindly agreed to respond to a series of questions I had prepared for them about the research. This allows a behind-the-scenes look of the original study, the decisions to perform replications, the evaluation of the replication attempts, and overall assessments of the main finding. The responses, which were given via email, are all the more interesting and instructive because they remarkably open and self-critical. By way of disclosure I should note that I know Daniël Lakens, Hans IJzerman, and Thomas Schubert personally.
The basic idea behind the 2009 study is that importance is associated with weight. There are of course several expressions that associate weight with importance, like weighty matter and the heavyweights of the field, but more relevant is that weight and importance are associated in perception and action. The authors observe that heavy objects are more difficult to move or yield than light objects (and therefore are energetically more demanding). Also, being struck by a heavy object has more consequences than being struck by a light object. Jostmann et al. summarize the situation as follows: heavy objects have more impact on our bodies than light ones do. Their thesis is that the concept of importance is grounded in weight and they test this idea in several experiments.
From their discussion of the grounding of importance in weight, Jostmann et al. derive the hypothesis that weight will influence judgments of importance. In Study 1 they test this idea by having subjects estimate the value of foreign currencies while holding a heavy or a light clipboard.
This is a clever idea but a more direct test of your hypothesis would have been to have subjects judge the value of the clipboard itself (or more realistically of some other object). Why did you forego this direct test?
NJ: We wanted to test the abstract implications of the assumed link between weight and importance. It seemed trivial - at least to me - that the weight of an object had an effect on its estimated value. Later we heard of studies (about the value of wine bottles) that confirmed that heavy objects are more valuable (at least under certain circumstances).
DL: This idea seemed (and still seems) almost trivial. In general, it seems the direct relation is less interesting than examining how the physical experience of weight influences judgments do not cause the experience of weight. There are now several studies that show such direct links (for examples, heavy wine bottles that are perceived to be more valuable) and we have found several of these effects in student projects (e.g., heavy paper cups are valued more, a light computer mouse is seen as less valuable compared to the same mouse with some lead hidden inside.
The inferential step you are making is that the weight of the clipboard gets transferred to the currency. This is an interesting idea. What is the mechanism you see at work here?
NJ: Probably in the domain of money strong associations exist between weight and value, and apparently it's not so difficult to disguise that the felt weight is actually related to something irrelevant (i.e., the clipboard).
DL: When we published the studies, my father said: Ah, so itʼs like an association, but then between a physical experience and an abstract concept? The more time passes, the more I think his description of the mechanism was pretty accurate.
Results (values averaged across currencies and across subjects) indicate that subjects holding the heavy clipboard judged the currencies to be more valuable than those holding the light clipboard, p = .04.
In Study 2, the researchers had subjects judge (again holding the clipboards) the importance of having a voice in a decision-making process. Their goal was to assess the effect of holding a clipboard in an abstract domain.
This seems a big step from Study 1. Do you think the same mechanism underlies responses in the two experiments?
NJ: I tend to think no. We ran Study 2 because we wanted to see whether the link between weight and importance affects judgments on topics that have nothing to do with weight (although some people have rightfully commented that justice is associated with a weighing scale). The mechanism is probably a bit more complex than in Study 1: weight is one dimension on which one can judge potency (i.e., what are the implications of something). It's not new that participants use this kind of dimensions to make judgments unless they can discount them (see Briñol & Petty, 2008, on an elaboration of how bodily cues can affect attitudes and persuasion on various levels).
DL: I ran this study, and was doing some other studies on morality at the time. So in terms of things I was working on, it was actually a really small step. If you consider that people were most likely performing judgments under uncertainty in Study 1, and how fair something is can also be influenced quite easily, it seemed just more of the same, but a more relevant topic to examine.
Results showed that subjects in the heavy clipboard condition found having a voice in decision making more important than did participants in the light clipboard condition, p < .05.
Again, this seems like a big step from the previous studies. Do you expect the same mechanism to be at work as in the previous studies?
NJ: In this study, the outcome could have been a different one: a heavy clipboard could have made participants evaluate the mayor to be more important, powerful, valuable etc. We did not find this but judgment coherence instead. The pattern made sense to us though because the attitude literature argues that coherence can occur when people find a topic important. So, we did not predict exactly this finding (see question 6). Back then, I believed that I had a good explanation and thus did not need to mention that the findings were in fact exploratory (I now think that I should have mentioned it). As for the mechanism: participants probably already had a relatively strong pre-existing attitude towards the mayor (there was some controversy in the media on his political measurements and some people found him weak while other found him strong). The heavy clipboard probably strengthened existing attitudes and made them more coherent but they couldn’t change them completely. Briñol and Petty have written a very interesting chapter on how bodily cues influence attitudes and persuasion.
Results show that there was no main effect of clipboard but that there was a correlation between the mayor and city evaluations in the heavy clipboard condition (r=.42 p.<05) not in the light clipboard condition (r=-.23, n.s.). The authors conclude that there was more cognitive elaboration in the heavy clipboard condition than in the light clipboard condition.
I don’t quite understand how this task measures cognitive elaboration. It seems a rather indirect way. Can you clarify? Also, was this the pattern you had predicted?
NJ: You are right that cognitive elaboration is just one possible explanation. A better test was done in study 4.
Study 4 examines the effect of weight on the evaluation of strong versus weak arguments, again in attempt to investigate the effect of weight on cognitive elaboration. The authors predict that holding the heavy clipboard will cause the subjects to assign proportionally more “weight” to strong arguments and less to weak ones, leading to more polarization in their evaluation of these arguments.
Again, this seems like a big step to me. What is the mechanism you think is at work here?
NJ: probably the same mechanism as in Study 2 and 3: weight signals potency, and if participants were looking for cues how important or valuable the issue at stake was for them, they used the - actually irrelevant - information of the clipboard weight.
The results show an interaction between clipboard and argument strength, p=.008. Although subjects holding the heavy clipboard agreed with more with the strong than with the weak arguments (p=.03) this difference was larger in the light clipboard condition (p<.001).
The authors conclude that “weight influences how people deal with abstract issues much as it influences how people deal with concrete objects: It leads to greater investment of effort. In our studies, weight led to greater elaboration of thought, as indicated by greater consistency between related judgments, greater polarization between judgments of strong versus weak arguments, and greater confidence in one’s opinion.”
Your studies focused more on abstract issues than on concrete objects. However, you did not conduct tests on concrete objects. Do you expect that the effect of weight would have been larger, equally large, or smaller if you had used objects?
NJ: the effect seems to be stronger and more robust if the value of concrete objects is judged. We did the most difficult but perhaps also more interesting studies.
DL: I think that heavy and light objects are much more strongly related to psychological value. So, the effects should be larger. I would guess it is not difficult to find these effects – we have done so a number of times, and so have others.
posted a “failed” (we still have to establish what it means to say that a replication attempt “failed”) replication of one of the experiments in the paper (Study 3) on the PsychFiledrawer website.
What was your reasoning behind (1) conducting the replication and (2) posting it on the PsychFiledrawer site?
NJ: we conducted the replication study on psychfiledrawer.org at the same time as the four published studies. As I have already said on psychfiledrawer, I believed back then that there were good reasons why the study failed (noise, changing attitudes, lack of power etc.) and it didn't occur to me that not mentioning it would do any harm. Only later when I learned that people were interested in the replicability of our finding (we received several requests to help with meta analyses) we decided that we should publish the results of the study.
DL: First of all, the ʽreplicationʼ was performed together with the initial studies, but because it was not significant, it was not submitted for publication. We now understand all our studies were underpowered, and not all studies should have been expected to work. When we published the paper, we did the normal thing, and not mention the non-significant finding. Now, with our increased understanding and thoughts about how you should do science, we wanted to do the right thing.
How do you evaluate the result of your replication in the context of the original experiment?
NJ: I still believe that there are good reasons why the study failed: noisy environment and a topic on which public attitudes were changing rapidly at that time. Lack of power was also a problem.
DL: The robustness of the effect in that study remains uncertain.
There are two other replication attempts on the site performed by other researchers. One is a success (replicating the original Study 2) and the other a failure (not replicating the original Study 2).
How do you evaluate the result of these attempts?
NJ: Apparently, it's not so easy to replicate our effects but at least some independent researchers were successful. I think that there are some parameters that we still don't understand that are necessary to take into account to find the weight-importance effect. It would be cool if someone published a paper on when our findings replicate and when not (hopefully experimenter demand or other artefacts are not an issue but if so, I'll be able to live with it).
DL: In the failed replication, there might have been a ceiling effect, as those authors note. Or, the effect might not be robust. We need meta-analyses to know more (and these are being performed).
IJzerman and colleagues attempted to replicate (Study 2 in their paper) the original Study 2 (apparently the most popular among replicators). They found that subjects holding the heavy clipboard gave higher importance ratings that subjects holding the light clipboard but this difference was not significant, p=.12.
What were your reasons for performing this replication attempt? And how do you evaluate the results? Does significance matter in a replication result?
HIJ Initially, a former student of mine (Justin Saddlemyer), Sander Koole and I wanted to investigate an individual difference variable that is both relevant to my earlier work (on warmth) and to Jostmann et al's (on weight). We started this project prior to the entire replication debate in psychology. So, the project started as a replication+extension project. Given the entire discussion on replication, we wanted to do an "intermediate reporting" of what we do know (the other results are promising, but we simply don't have enough answers yet to report in a publication).
Also initially, we did not evaluate the replication as properly as we probably should have. I think Uri Simonsohn's method is a useful one. We had submitted the project to PloSOne, but for some odd reason PloSOne is doubting the ethical procedures. We hope to get that sorted out and will do the rewrite, including Uri's method of evaluating the effect sizes. So no, we don't think necessary the p value is the crucial way of evaluating.
How do the original authors evaluate this replication attempt and its result?
NJ: Hans does not provide detailed information about how the study was run. The weight was different and the study was underpowered (as were ours). It's difficult to say why it didn't work.
DL: I donʼt know the sample size in that study, but significance per se is less interesting when all studies are underpowered. Again, we have to wait for the meta-analysis.
HIJ: Agreed on the underpowered. We still think it is useful to report these studies, but agree that if we were to run another study, we would probably do a registered report, examining all the details that we present in the Replication Recipe. If the present study were to be published, by the way, all details of the study will be uploaded to Dataverse, so the amount of detail will probably be greater than what we currently include in our research summaries (i.e., publication).
How do you evaluate the usefulness of replications in general? Should researchers try to replicate their own results?
NJ: yes, they should whenever possible.
HIJ: Agreed, but ideally another lab should be able to replicate our studies. For this to happen, we do need to start reporting more detail of our studies.
Taken together all the empirical evidence, how much support is there for the notion that weight influences judgments of importance?
NJ: there is some support and I still believe that the link exists. Too many independent researchers (see M. Hafner, Experimental psychology) have successfully replicated our effects (even close replications) to make me think that we are dealing with a false positive. The effect might not be as strong as we thought though.
HIJ I think more so than most social psychology studies. That said, many studies - including ours - are underpowered. Given what Nils mentions and the general theoretical premise, I agree that it is unlikely that this is a false positive.
DL: Well, there are some successful replications, many of the studies show effects in the right direction, and we seem to have a rather nice number of studies for a meta-analysis. I think overall there might be an indication something is going on, but we donʼt have a good grasp on the size of the effect, and the factors that influence the effect size. Still, it seems an interesting candidate to examine further. The effect of weight on concrete objects seems pretty large – the question is whether it extends to abstract concepts requires further examination. (Other replication attempts of the weight-importance study are listed at the bottom of this page at Lakens' site.)
Do you have any additional comments?
HIJ: I think for researchers it may sometimes be nerve-wracking if people try to replicate, in particular if they truly believe in certain effects. After all, your world view in a way is being violated. But, then again, I think these are exciting times, in that we get to know much more about effect sizes, how different effects scale up to one another, and what contextual factors are important in reproducing effects.
TS: It was interesting though to see how you summarized the paper, and it made me realize something. Our paper was a combination of the embodiment idea that abstract concepts are grounded in concrete experience, and work on persuasion. Studies 1 and 2 were about the embodiment notion - importance of abstract issues, like monetary value and voice, were influenced by concrete experiences. Then, Studies 3 and 4 combined this with work on persuasion. Although we did not actually study persuasion, we measured outcomes of social cognition processes identified in work on attitudes in the persuasion literature. Our interpretation of these results was that the patterns (alignment of related attitudes and polarization of strong ones vs. weak ones) reflected effortful processing. It may have been pretty oblique in the paper, but there is a large body of work on attitudes in social cognition behind this notion.
In hindsight, it might have made more sense to write two separate papers about these two parts, and to elaborate on each one more. Interestingly, the people who have followed up on this clearly were more interested in the first idea.Given that this is "a candid blog," it will surprise no one that I very much appreciate the candor and lack of defensiveness that is evident in these responses. I couldn't have summarized this discussion any better than Nils Jostmann just did: "effects are public property and not personal belongings." I look forward to hearing more about the meta-analyses of the weight-importance effect.