Tuesday, October 22, 2013

"Effects are Public Property and not Personal Belongings": a Post-Publication Conversation

Welcome to a post-publication conversation on social-cognitive priming!  The impetus for the conversation is a social-cognitive priming article by Jostmann, Lakens, and Schubert that was published in 2009 in Psychological Science. The article is interesting enough in and of itself but what makes it an even more interesting discussion topic is that the authors themselves have performed and reported replication attempts of some of their findings. In addition, there are replication attempts by others.

The authors of the 2009 study, Nils Jostmann (NJ), Daniël Lakens (DL), and Thomas Schubert (TS), plus the author of a replication study, Hans IJzerman (HIJ), kindly agreed to respond to a series of questions I had prepared for them about the research. This allows a behind-the-scenes look of the original study, the decisions to perform replications, the evaluation of the replication attempts, and overall assessments of the main finding. The responses, which were given via email, are all the more interesting and instructive because they remarkably open and self-critical. By way of disclosure I should note that I know Daniël Lakens, Hans IJzerman, and Thomas Schubert personally.

The basic idea behind the 2009 study is that importance is associated with weight. There are of course several expressions that associate weight with importance, like weighty matter and the heavyweights of the field, but more relevant is that weight and importance are associated in perception and action. The authors observe that heavy objects are more difficult to move or yield than light objects (and therefore are energetically more demanding). Also, being struck by a heavy object has more consequences than being struck by a light object. Jostmann et al. summarize the situation as follows: heavy objects have more impact on our bodies than light ones do. Their thesis is that the concept of importance is grounded in weight and they test this idea in several experiments.

 From their discussion of the grounding of importance in weight, Jostmann et al. derive the hypothesis that weight will influence judgments of importance. In Study 1 they test this idea by having subjects estimate the value of foreign currencies while holding a heavy or a light clipboard.

Question 1
This is a clever idea but a more direct test of your hypothesis would have been to have subjects judge the value of the clipboard itself (or more realistically of some other object). Why did you forego this direct test? 
NJ: We wanted to test the abstract implications of the assumed link between weight and importance. It seemed trivial - at least to me - that the weight of an object had an effect on its estimated value. Later we heard of studies (about the value of wine bottles) that confirmed that heavy objects are more valuable (at least under certain circumstances). 
DL: This idea seemed (and still seems) almost trivial. In general, it seems the direct relation is less interesting than examining how the physical experience of weight influences judgments do not cause the experience of weight. There are now several studies that show such direct links (for examples, heavy wine bottles that are perceived to be more valuable) and we have found several of these effects in student projects (e.g., heavy paper cups are valued more, a light computer mouse is seen as less valuable compared to the same mouse with some lead hidden inside.
Question 2 
The inferential step you are making is that the weight of the clipboard gets transferred to the currency. This is an interesting idea. What is the mechanism you see at work here?  
NJ: Probably in the domain of money strong associations exist between weight and value, and apparently it's not so difficult to disguise that the felt weight is actually related to something irrelevant (i.e., the clipboard). 
DL: When we published the studies, my father said: Ah, so itʼs like an association, but then between a physical experience and an abstract concept? The more time passes, the more I think his description of the mechanism was pretty accurate.
Results (values averaged across currencies and across subjects) indicate that subjects holding the heavy clipboard judged the currencies to be more valuable than those holding the light clipboard, p = .04.

In Study 2, the researchers had subjects judge (again holding the clipboards) the importance of having a voice in a decision-making process. Their goal was to assess the effect of holding a clipboard in an abstract domain.
Question 3 
This seems a big step from Study 1. Do you think the same mechanism underlies responses in the two experiments?
NJ: I tend to think no. We ran Study 2 because we wanted to see whether the link between weight and importance affects judgments on topics that have nothing to do with weight (although some people have rightfully commented that justice is associated with a weighing scale). The mechanism is probably a bit more complex than in Study 1: weight is one dimension on which one can judge potency (i.e., what are the implications of something). It's not new that participants use this kind of dimensions to make judgments unless they can discount them (see Briñol & Petty, 2008, on an elaboration of how bodily cues can affect attitudes and persuasion on various levels).  
DL: I ran this study, and was doing some other studies on morality at the time. So in terms of things I was working on, it was actually a really small step. If you consider that people were most likely performing judgments under uncertainty in Study 1, and how fair something is can also be influenced quite easily, it seemed just more of the same, but a more relevant topic to examine.
Results showed that subjects in the heavy clipboard condition found having a voice in decision making more important than did participants in the light clipboard condition, p <  .05.

In Study 3, the authors reasoned that weight is associated with cognitive effort (e.g., it takes greater cognitive planning to move heavy objects than to move light ones). They tested this idea by having subjects (again with the clipboard) engage in a cognitive task and assess effort. Subjects described how much they liked the mayor of the city in which they were living, Amsterdam, and how satisfied they were with the city itself (quite sensibly, the subjects were very satisfied). The operationalization of cognitive elaboration was the correlation between the two types of statements.
Question 4
Again, this seems like a big step from the previous studies. Do you expect the same mechanism to be at work as in the previous studies? 
 NJ: In this study, the outcome could have been a different one: a heavy clipboard could have made participants evaluate the mayor to be more important, powerful, valuable etc. We did not find this but judgment coherence instead. The pattern made sense to us though because the attitude literature argues that coherence can occur when people find a topic important. So, we did not predict exactly this finding (see question 6). Back then, I believed that I had a good explanation and thus did not need to mention that the findings were in fact exploratory (I now think that I should have mentioned it). As for the mechanism: participants probably already had a relatively strong pre-existing attitude towards the mayor (there was some controversy in the media on his political measurements and some people found him weak while other found him strong). The heavy clipboard probably strengthened existing attitudes and made them more coherent but they couldn’t change them completely. Briñol and Petty have written a very interesting chapter on how bodily cues influence attitudes and persuasion. 
Results show that there was no main effect of clipboard but that there was a correlation between the mayor and city evaluations in the heavy clipboard condition (r=.42 p.<05) not in the light clipboard condition (r=-.23, n.s.). The authors conclude that there was more cognitive elaboration in the heavy clipboard condition than in the light clipboard condition.
Question 5 
I don’t quite understand how this task measures cognitive elaboration. It seems a rather indirect way. Can you clarify?  Also, was this the pattern you had predicted?
 NJ: You are right that cognitive elaboration is just one possible explanation. A better test was done in study 4. 
Study 4 examines the effect of weight on the evaluation of strong versus weak arguments, again in attempt to investigate the effect of weight on cognitive elaboration. The authors predict that holding the heavy clipboard will cause the subjects to assign proportionally more “weight” to strong arguments and less to weak ones, leading to more polarization in their evaluation of these arguments.
Question 6 
Again, this seems like a big step to me. What is the mechanism you think is at work here?
NJ: probably the same mechanism as in Study 2 and 3: weight signals potency, and if participants were looking for cues how important or valuable the issue at stake was for them, they used the - actually irrelevant - information of the clipboard weight.
The results show an interaction between clipboard and argument strength, p=.008. Although subjects holding the heavy clipboard agreed with more with the strong than with the weak arguments (p=.03) this difference was larger in the light clipboard condition (p<.001).

The authors conclude that “weight influences how people deal with abstract issues much as it influences how people deal with concrete objects: It leads to greater investment of effort. In our studies, weight led to greater elaboration of thought, as indicated by greater consistency between related judgments, greater polarization between judgments of strong versus weak arguments, and greater confidence in one’s opinion.”
Question 7  
Your studies focused more on abstract issues than on concrete objects. However, you did not conduct tests on concrete objects. Do you expect that the effect of weight would have been larger, equally large, or smaller if you had used objects? 
NJ: the effect seems to be stronger and more robust if the value of concrete objects is judged. We did the most difficult but perhaps also more interesting studies.
DL: I think that heavy and light objects are much more strongly related to psychological value. So, the effects should be larger. I would guess it is not difficult to find these effects – we have done so a number of times, and so have others.
The article was published and a number of years later, the authors did something remarkable. They posted a “failed” (we still have to establish what it means to say that a replication attempt “failed”) replication of one of the experiments in the paper (Study 3) on the PsychFiledrawer website.
Question 8  
What was your reasoning behind (1) conducting the replication and (2) posting it on the PsychFiledrawer site? 
NJ: we conducted the replication study on psychfiledrawer.org at the same time as the four published studies. As I have already said on psychfiledrawer, I believed back then that there were good reasons why the study failed (noise, changing attitudes, lack of power etc.) and it didn't occur to me that not mentioning it would do any harm. Only later when I learned that people were interested in the replicability of our finding (we received several requests to help with meta analyses) we decided that we should publish the results of the study.
DL: First of all, the ʽreplicationʼ was performed together with the initial studies, but because it was not significant, it was not submitted for publication. We now understand all our studies were underpowered, and not all studies should have been expected to work. When we published the paper, we did the normal thing, and not mention the non-significant finding. Now, with our increased understanding and thoughts about how you should do science, we wanted to do the right thing.
 Question 9  
How do you evaluate the result of your replication in the context of the original experiment?
 NJ: I still believe that there are good reasons why the study failed: noisy environment and a topic on which public attitudes were changing rapidly at that time. Lack of power was also a problem. 
DL: The robustness of the effect in that study remains uncertain.
There are two other replication attempts on the site performed by other researchers. One is a success (replicating the original Study 2) and the other a failure (not replicating the original Study 2).
Question 10  
How do you evaluate the result of these attempts?  
 NJ: Apparently, it's not so easy to replicate our effects but at least some independent researchers were successful. I think that there are some parameters that we still don't understand that are necessary to take into account to find the weight-importance effect. It would be cool if someone published a paper on when our findings replicate and when not (hopefully experimenter demand or other artefacts are not an issue but if so, I'll be able to live with it). 
DL: In the failed replication, there might have been a ceiling effect, as those authors note. Or, the effect might not be robust. We need meta-analyses to know more (and these are being performed).
IJzerman and colleagues attempted to replicate (Study 2 in their paper) the original Study 2 (apparently the most popular among replicators). They found that subjects holding the heavy clipboard gave higher importance ratings that subjects holding the light clipboard but this difference was not significant, p=.12.
Question 11
What were your reasons for performing this replication attempt? And how do you evaluate the results? Does significance matter in a replication result? 
HIJ Initially, a former student of mine (Justin Saddlemyer), Sander Koole and I wanted to investigate an individual difference variable that is both relevant to my earlier work (on warmth) and to Jostmann et al's (on weight). We started this project prior to the entire replication debate in psychology. So, the project started as a replication+extension project. Given the entire discussion on replication, we wanted to do an "intermediate reporting" of what we do know (the other results are promising, but we simply don't have enough answers yet to report in a publication).  
Also initially, we did not evaluate the replication as properly as we probably should have. I think Uri Simonsohn's method is a useful one. We had submitted the project to PloSOne, but for some odd reason PloSOne is doubting the ethical procedures. We hope to get that sorted out and will do the rewrite, including Uri's method of evaluating the effect sizes. So no, we don't think necessary the p value is the crucial way of evaluating. 
 Question 12 
How do the original authors evaluate this replication attempt and its result? 
NJ: Hans does not provide detailed information about how the study was run. The weight was different and the study was underpowered (as were ours). It's difficult to say why it didn't work. 
DL: I donʼt know the sample size in that study, but significance per se is less interesting when all studies are underpowered. Again, we have to wait for the meta-analysis. 
HIJ: Agreed on the underpowered. We still think it is useful to report these studies, but agree that if we were to run another study, we would probably do a registered report, examining all the details that we present in the Replication Recipe. If the present study were to be published, by the way, all details of the study will be uploaded to Dataverse, so the amount of detail will probably be greater than what we currently include in our research summaries (i.e., publication).   
 Question 13 
How do you evaluate the usefulness of replications in general? Should researchers try to replicate their own results? 
NJ: yes, they should whenever possible. 
HIJ: Agreed, but ideally another lab should be able to replicate our studies. For this to happen, we do need to start reporting more detail of our studies.   
 Question 14 
Taken together all the empirical evidence, how much support is there for the notion that weight influences judgments of importance? 
NJ: there is some support and I still believe that the link exists. Too many independent researchers (see M. Hafner, Experimental psychology) have successfully replicated our effects (even close replications) to make me think that we are dealing with a false positive. The effect might not be as strong as we thought though.
HIJ I think more so than most social psychology studies. That said, many studies - including ours - are underpowered. Given what Nils mentions and the general theoretical premise, I agree that it is unlikely that this is a false positive.   
DL: Well, there are some successful replications, many of the studies show effects in the right direction, and we seem to have a rather nice number of studies for a meta-analysis. I think overall there might be an indication something is going on, but we donʼt have a good grasp on the size of the effect, and the factors that influence the effect size. Still, it seems an interesting candidate to examine further. The effect of weight on concrete objects seems pretty large – the question is whether it extends to abstract concepts requires further examination. (Other replication attempts of the weight-importance study are listed at the bottom of this page at Lakens' site.)  
Question 15 
Do you have any additional comments?  
NJ: When I was a grad student I was surprised to see some researchers being very defensive regarding “their” effect. I’d like to thank all the people who contacted us about the reliability and validity of the weight-importance effect. They reminded me that effects are public property and not personal belongings. 
HIJ: I think for researchers it may sometimes be nerve-wracking if people try to replicate, in particular if they truly believe in certain effects. After all, your world view in a way is being violated. But, then again, I think these are exciting times, in that we get to know much more about effect sizes, how different effects scale up to one another, and what contextual factors are important in reproducing effects.  
TS: It was interesting though to see how you summarized the paper, and it made me realize something. Our paper was a combination of the embodiment idea that abstract concepts are grounded in concrete experience, and work on persuasion. Studies 1 and 2 were about the embodiment notion - importance of abstract issues, like monetary value and voice, were influenced by concrete experiences. Then, Studies 3 and 4 combined this with work on persuasion. Although we did not actually study persuasion, we measured outcomes of social cognition processes identified in work on attitudes in the persuasion literature. Our interpretation of these results was that the patterns (alignment of related attitudes and polarization of strong ones vs. weak ones) reflected effortful processing. It may have been pretty oblique in the paper, but there is a large body of work on attitudes in social cognition behind this notion. 
In hindsight, it might have made more sense to write two separate papers about these two parts, and to elaborate on each one more. Interestingly, the people who have followed up on this clearly were more interested in the first idea. 
Given that this is "a candid blog," it will surprise no one that I very much appreciate the candor and lack of defensiveness that is evident in these responses. I couldn't have summarized this discussion any better than Nils Jostmann just did: "effects are public property and not personal belongings." I look forward to hearing more about the meta-analyses of the weight-importance effect.


  1. Together with a group of researchers at the University of Amsterdam, we've recently tried to replicate this effect. We were particularly thorough, tested a ton of subjects (about 100 I believe, but I need to look this up), preregistered the study, and conducted a Bayesian analysis. There was no trace of an effect -- in fact, the data were more than 10 times more likely under H0 than under H1. I'm slowly starting to get to the point where I only trust these social priming effects when they've been replicated using study preregistration. I'm not saying the effect does not exist (!), but I am saying I'd like to see another few preregistered replications. It will be particularly interesting what happens with the preregistered studies in the special issues of Frontiers and Social Cognition.

    1. You're right that pre-registered replications are the way to go. I'm sure Nils, Daniel, Thomas, and Hans will agree. Hans and Daniel, for example, are engaged in pre-registered replication projects.

    2. What was the H1 you used EJ, the point-estimate of the papers effect size or something else?

    3. nm, didn't refresh the page and see it in the poster now

    4. Preregistration is the future for studies that aren't replications too... but implementing it for replications will be easiest, and it's great to see many already underway.

  2. I wonder what would happen if, rather than being heavy and light versions of the same thing, the two objects were, say, a Fabergé egg and half a brick (or some less extreme dichotomy). That is, the physically heavier one is clearly less valuable/desirable than the lighter one. Maybe that would start to tease out whether the effect is due to mass (unless your funding extends to experiments in zero gravity environments) or some other, more metaphorical form of "weightiness" (e.g., "I bet that's worth a ton of money").

  3. Hi EJ, you mentioned this replication before - it would be great if you could make some details available, such as which study you replicated (my Bayesian prior tells me it is least likely to be Study 2 - correct?) and upload a small summary to PsychFileDrawer.org, with a short discussion of why it might be that the effect did not replicate. I think statements about not being able to replicate an effect are important to share, but you should only share them if you provide enough information to judge the replication.

  4. Hi EJ, you have asked me to help with the preparation of your replication study and to preregister with my name (which I both did). It would be nice to get informed about the results before I have to read your opinion in this blog. Best, Nils

  5. Hi Nils,

    Maybe you forgot you discussed the results in front of Titia's poster at the Res Mas conference about two weeks ago. Also, I am reporting facts, not an opinion, and I don't see why I need to be chastised publicly about reporting the outcome of scientific work that is highly relevant in the present context.



    1. Hi Eric-Jan,
      Yes, Titia was so kind to discuss the results with me after I coincidentally ran into her poster. Perhaps it's also informative to add here that it was an attempt to replicate one of the studies from Ackerman et al, 2010.

    2. Hi Nils,

      After some additional thought, I now understand your point better. Perhaps I should have communicated the results with you earlier, and more directly. I also should have made it clear on this blog that the replication concerns the study of Ackerman. Finally, I could have made a clearer distinction between the outcomes of the study and my general beliefs. I also want to stress that I appreciate your help in getting the details of this replication attempt just right.

      Let's discuss the next steps over a friendly cup of coffee instead of on this blog :-)


    3. Not that this blog is a non-friendly environment of course.;)

    4. Hi Eric-Jan,

      Thanks for this post. I should have contacted you directly rather than replying to you on this (however friendly) blog. Anyhow, let's have a coffee soon!

  6. Hi Daniel

    This is a blog, not an academic journal with peer review, and this means I feel somewhat uninhibited to provide the gist of the study and the results without going in to the nitty-gritty details. We are currently working to write this up. What I can do is ask Titia to provide a link to her poster in which she summarized the work. Once I have the poster I'll put it on dropbox and provide the link here.


    1. And here's the poster: https://dl.dropboxusercontent.com/u/1018886/Temp/PosterPresentation_TB.pdf
      I realize this does not provide all information, but it's the best I can do right now. We also have a very detailed preregistration form which I can send you in case you are interested.

  7. With the release of the iPad air, along with much other tech, in which lighter = more expensive, do you think these perception would flip?

    1. Of course, we'd have to ask Nils and the others, but I'd thought something similar. When lightness is the dimension of interest (e.g., with running shoes, race bikes, and tents), one would expect light items to be valued the most. But this is the marked case.

    2. Hi Sara, I can imaginge that the effect flips. I've recently reviewed a paper in which was shown that holding a heavy (compared to light) clipboard increased the value (compared to costs) of skin cancer protection but, in an independent sample, decreased the value (again compared to the costs) of oral hygiene. Apparently, the effect of weight --> more value is not static. Consider ook the famous backpack studies from Dennis Proffitt's lab: weight also signals costs. The moderator(s) still have to be identified.

  8. Hi EJ

    Thank you for the comment. I think there are a couple of things in regards to what you are saying. First, you know as well that we fully support preregistration, as you have seen one of our currently registered studies. The first study that you reviewed was also registered at the OSF, so you have seen a successful registered "social priming" study. So it's not true you have not seen one. Beyond that, we mostly do so now for (all our) replication studies, but we will also do this for novel research in the future.

    Also, can you please explain the concept of "social priming" to me? Are those effects that you don't believe? I fail to understand what exactly the concept of social priming means for some folks. It may be useful to define this concept, as it was not used by many researchers (in fact, we don't use the word "priming" at all for the warmth effects, because we believe it to be something different than the earlier reported effects).

    But, most importantly, if you don't believe our replication studies (which are, like Nils said, underpowered) because they are not preregistered, you would imply we obtain the effects because we are p-hacking? Again, I am in favor of the preregistration issue, but not buying the effects basically implies that we are doing something else with our data to obtain our effects. Is that what you mean? Could you perhaps clarify?

    The weight-importance "social priming" is the third effect we replicate....we will report on another one soon (unfortunately not preregistered, but with a large sample and a priori calculated sample size).


  9. It would be interesting to know if people's body weight plays any role in such judgments. For example, would a 50kg girl have the same feeling as a 100kg guy with regard to weight of an object?

  10. EJ,

    I wonder, does the word heavy/weighty in Dutch have a double meaning of physical weight and importance (like in English). It seems to me that if the mechanism is an association between a physical body state and an abstract concept, then in cultures where there is no association between heaviness and importance, we should expect the effect.

    In Russian for example, the word heavy is associated with the abstract concept of hard, as in difficult. Which would suggest priming weight would have an entirely different psychological consequence in Russia - perhaps holding a weighty clipboard will lead people to judge problem-sets difficult, etc.

    Finally, if the mechanism is indeed an association between two concepts being activated, how theoretically novel is the finding?

    1. Speaking for E.J., as another native speaker of the language, I can confirm that Dutch has the same double meaning that English has: zwaar/zwaarwichtig.

    2. Good question Garik. Hans assumes that the interpersonal warmth = physical warmth effect is more than an association (if I understood correctly). I'm curious to hear what the proposed mechanism is.

      If weight has no connotation with value or importance in Russian whatsoever, I'd love to run a weight study there. The language link can even been found in Chinese by the way, so it's not something special of Germanic languages like Dutch or English.

    3. Fun fact, in Chinese characters, "sleep" is "eye" + "heavy".

  11. hi all

    Yes, we sure would not think that young infants have an abstract notion of trust. Instead, we would think that infants early on rely on basic "perceptual building blocks" (as Bowlby would have it, or see also Hendriks-Jansen or Fiske). We presume that infants learn how to trust others through these basic perceptual cues, one of which is warmth (but these may also be something like smell, or tone of voice). Given that communal sharing relationships are formed and marked through acts like touch or physical closeness (all of which invariably involves sensing something warm), it seems reasonable to assume that infants have some kind of predisposition to rely on warmth as a cue to trust the others. Essentially, there's nothing abstract essentially when we think about trust/affection, but I think we have to look more towards brain mechanisms for themoregulation, and stuff like oxytocin secretion (OT which leads to vasodilation and a warming of the skin).

    Yet, we have little support for this. There's indirect support for it (Harlow, amongst others), and at least what Lakoff and Johnson initially proposed is not right, in terms of their directionality argument. This does not mean, by the way, that strong associations may not be bi-directional, but the basic CMT argument just did not fit.

    Note: this does not mean that no associations are learned later (like in the case of different attachment styles, or learning about different groups of people), but to me it seems reasonable that a warm touch is basically what trust means for infants, and that only later we start building our abstract ideas around this (the scaffolding stuff others have talked about).

    A similar argument you guys also made, Nils, but I presume you are going back to the association account now between concrete and abstract experiences? What is the argument for weight now?


    1. Hi Hans,
      I guess I can't think of a strong test to differentiate between an embodied explanation and a priming effect.