A PostPublication PeerReview (3PR) of Time, Money, and Morality
Gino, F., & Mogilner, C. (online, 2013). Time, Money, and Morality. Psychological Science. DOI: 10.1177/0956797613506438
File under:
HIBAR: Had I Been A Reviewer…
3PR: PostPublication PeerReview (or: 3mpirical Plausability Resuscitation)
Performed by Fred Hasselman
Contact me if you have any questions
Introduction
The Time, Money, and Morality article has been HIBARed on Twitter and the Blogosphere (e.g., by Rolf Zwaan and Greg Francis ) and the discussion seems to revolve around the validity of the inferences based pvalues close to 0.05 (e.g., they raise suspicions of phacking).
In short, the article reports of 4 Experiments testing 2 core postulates:
 Postulate 1: Priming
Money
activates selfinterest and increases unethical behaviour
 Postulate 2: Priming
Time
activates selfreflection and decreases unethical behaviour
Unethical behaviour is operationalised as taking the opportunity to cheat on a task.
Priming methods vary across experiments, so do the tasks that allow for an opportunity to cheat.
In Experiment 1 the two postulates are tested, Experiments 24 concern an assessment of the role of selfreflection on cheating behaviour and is operationalised differently across experiments.
Hold on to your Pcurves for a moment… Back to the basics!
In this PostPublication PeerReview (3PR) I demonstrate that there is indeed some cause for concern about the way these results are presented and interpreted. Was it phacking? … I don’t know and maybe I don’t even care. To me this is an example of sloppy science, phacked or not, these results were allowed to be published by expert peers. It is more relevant to discuss the broken system of quality control that should have picked up on at least some of the following issues:
 Important information is missing:
 in general (e.g., number of subjects per condition, sample size determination)
 selectively across experiments (e.g., participants per cell, reporting of effect sizes)
 The analyses used on frequency data are inappropriate
 Invalid or biased inferences and oddities:
 No adjustments for multiple comparisons
 “Marginal significance” shifts ad hoc between
0.1 > p > 0.05
 Obvious intervening/mediator variable is omitted: Accuracy of performance
 No explanation of (conflicting) results across experiments (e.g., variation in amount of cheating)
 No explanation for failing of random assignment to design levels (none of the experiments have equal N samples)
The article under scrutiny is by no means exceptional with respect to such issues, moreover, the way frequency / proportion data are analysed in psychological science is generally awkward and most of the time completely wrong.
I will 3PR the data based on the information in the article and comment on the results:
I. Analysis of Proportion / frequency data
II. Analysis of Extent of Cheating data
III. HAPPEing: Hypothesing After Post Publication Evaluation
The R code used to generate the results (and this page) is available in this Markdown file, and this post explains how to post to a WordPress blog.
I. Analysis of proportion / frequency data
Some concerns can be raised about the significant differences between various conditions in proportion Cheating
reported in the 4 experiments.
First and foremost, no corrections for multiple comparisons are conducted, should one do so, just 2 significant proportion differences remain:
Money
vs. Time
in experiment 1 & 4. In Experiment 3, the sample difference No Mirror: Money  Time
was marginally significant in the 2^nd significant digit (original: p = 0.015
, adjusted = 0.013, Bonferroni).
Second, no continuity correction is applied, these proportions are calculated from discrete numbers (participants). If a continuity correction is applied, 23 significant differences remain, depending on the level chosen:
Exp. 
Contrast 
Published 
Continuity corrected 
Bonferroni adjusted 
1 
MoneyTime 
<.001 
4 × 10^{4} 
< 0.0167 
1 
MoneyCtrl 
<.05 
0.0894 
> 0.0167 
1 
TimeCtrl 
<.05 
0.0836 
> 0.0167 





2 
Int: MoneyTime 
<.01 
0.1493 
~ 0.0125 
2 
Per: MoneyTime 
>.05 
1 
> 0.0125 
2 
Money: IntPer 
<.03 
0.0856 
> 0.0125 
2 
Time: IntPer 
>.05 
1 
> 0.0125 





3 
Mir: MoneyTime 
>.05 
0.7996 
> 0.0125 
3 
NoM: MoneyTime 
<.003 
0.0293 
~ 0.0125 
3 
Money: MirNoM 
>.05 
0.0537 
> 0.0125 
3 
Time: MirNoM 
>.05 
1 
> 0.0125 





4 
MoneyTime 
<.001 
10^{4} 
< 0.0167 
4 
MoneyCtrl 
<.05 
0.0522 
> 0.0167 
4 
TimeCtrl 
<.05 
0.0752 
> 0.0167 











Number sig. results 
9 
3 
Original: 4, Continuity: 2 
This calls for a more appropriate analysis of frequency data:
 Loglinear analysis of observed cell frequencies
 Exact odds ratios of 2×2 subtables to test hypotheses using Effect Size CIs
(Cheating
can be considered a dichotomous response, so logistic regression could also be used, see III. HAPPEing)
Note:
Experiment 2 & 3 do not list n per condition, the most likely values for n (1. closest to an integer value; 2. as equal as possible; 3. Add to total N) are assumed:
Experiment 2
Prime 
Assessment 
Ncond * %Cheat = Ncheat (deviation) 
Money 
Personality 
36 * 0.2778 = 10.0008 (8 × 10^{4}) 
Time 
Personality 
35 * 0.2857 = 9.9995 (5 × 10^{4}) 
Money 
Intelligence 
38 * 0.5 = 19 (0) 
Time 
Intelligence 
33 * 0.303 = 9.999 (10 × 10^{4}) 
Experiment 3
Prime 
Assessment 
Ncond * %Cheat = Ncheat (deviation) 
Money 
Mirror 
31 * 0.387 = 11.997 (0.003) 
Time 
Mirror 
28 * 0.321 = 8.988 (0.012) 
Money 
No Mirror 
30 * 0.667 = 20.01 (0.01) 
Time 
No Mirror 
31 * 0.355 = 11.005 (0.005) 
1. loglinear analysis of observed cell frequencies
Loglinear analysis, or poisson regression using the generalised linear model, can be used to test whether relationships exist among the variables in a multiway contingency table. Here I analyse the number of participants in each cell of the design: The observed frequencies take the role of the dependent variable and the levels of the design factors such as Mediator
, Prime
and Cheating
are considered the levels of independent variables (another option would have been a logistic / probit regression with Cheating
as the dependent binary / proportion variable).
Two types of result given for each experiment:
First, a table listing deviance tests for the full (saturated) model. The analysis starts with the NULL model (all frequencies are equal) in the first row. Each subsequent row lists what happens to the deviance (of the model in the previous row) when a factor is added. A significant drop in deviance means adding the factor to the model contributes to predicting the difference between expected and observed frequencies. For hints of corroboration of the hypotheses reported in the paper, significant interactions between a design factor and Cheating
are necessary.
Second, a mosaic plot is displayed, this is a graphical representation of the conditional cell frequencies. The mosaic plot also indicates which residual frequencies (observed – expected) are significantly below (red) or above (blue) the expected frequencies (residuals are interpretable as a Zscore). The coloured cells contribute most to a high and possibly significant value.
Note:
The significance of the change in deviance can depend on the order in which factors are added to the model and is not the same as a significant beta weight in a regression model.
> [1] "Experiment 1"
> Analysis of Deviance Table
>
> Model: poisson, link: log
>
> Response: Count
>
> Terms added sequentially (first to last)
>
>
> Df Deviance Resid. Df Resid. Dev Pr(>Chi)
> NULL 5 24.8
> Cheating 1 9.33 4 15.4 0.00225 **
> Prime 2 0.02 2 15.4 0.98981
> Cheating:Prime 2 15.41 0 0.0 0.00045 ***
> 
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> [1] "Experiment 2"
> Analysis of Deviance Table
>
> Model: poisson, link: log
>
> Response: Count
>
> Terms added sequentially (first to last)
>
>
> Df Deviance Resid. Df Resid. Dev Pr(>Chi)
> NULL 7 19.64
> Cheating 1 13.86 6 5.78 0.0002 ***
> Prime 1 0.25 5 5.52 0.6146
> Test 1 0.00 4 5.52 1.0000
> Cheating:Prime 1 1.51 3 4.02 0.2198
> Cheating:Test 1 2.53 2 1.48 0.1114
> Prime:Test 1 0.03 1 1.45 0.8609
> Cheating:Prime:Test 1 1.45 0 0.00 0.2284
> 
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> [1] "Experiment 3"
> Analysis of Deviance Table
>
> Model: poisson, link: log
>
> Response: Count
>
> Terms added sequentially (first to last)
>
>
> Df Deviance Resid. Df Resid. Dev Pr(>Chi)
> NULL 7 11.50
> Cheating 1 2.14 6 9.36 0.144
> Prime 1 0.03 5 9.32 0.855
> Test 1 0.03 4 9.29 0.855
> Cheating:Prime 1 4.24 3 5.05 0.040 *
> Cheating:Test 1 2.85 2 2.21 0.092 .
> Prime:Test 1 0.50 1 1.71 0.481
> Cheating:Prime:Test 1 1.71 0 0.00 0.191
> 
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> [1] "Experiment 4"
> Analysis of Deviance Table
>
> Model: poisson, link: log
>
> Response: Count
>
> Terms added sequentially (first to last)
>
>
> Df Deviance Resid. Df Resid. Dev Pr(>Chi)
> NULL 5 21.3
> Cheating 1 4.22 4 17.1 0.03996 *
> Prime 2 0.29 2 16.8 0.86607
> Cheating:Prime 2 16.76 0 0.0 0.00023 ***
> 
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusion loglinear analysis:
This alternative, and in my opinion more appropriate analysis is in agreement with the results after correction for multiple comparisons and continuity:
 The mosaic plots show that there may be some unexpected factors driving the “effects” reported in the paper:
 In experiment 1 & 4 it is not so much the observed frequency of people that did cheat, but the number of participants that did not cheat that deviate from the expected frequencies based on table margins.
 The
Money
prime caused less people to NOT cheat, whereas the Time
prime caused more people to NOT cheat
 If there is a difference in amount of
Cheating
between samples, it is likely a “main effect” between the Time
and Money
prime (Cheating:Prime
interaction), it is found to cause a significant drop in deviance in Experiments 1, 3 and 4.
 Experiment 2 stands out, because observed differences in
Cheating
are unlikely due to chance, but none of the other factors contribute to explain differences between expected and observed frequencies.
The point about the mosaic plots is not just semantics or methodologists’ nitpicking. What it tells us is that, e.g. in the mosaic plot Table.1.1, among the observed frequencies of CheatYES
, the cell Money
does not stand out much from Time
and Control
from what may be expected by chance, for CheatNO
on the other hand, the cell Money
does stand out as different.
2. Exact odds ratios of 2×2 subtables to test hypotheses using Effect Size CIs
Effect Size Confidence Intervals:
To get a clearer idea about the significance between cell differences I calculate confidence intervals around the effect size associated with contingency tables. The CIs in Figure 1 below are based on the exact Odds Ratio (using the noncentral hypergeomteric distribution) for a 2×2 subtable of the full design obtained from Fisher's Exact Test
, testing against .
> [1] "Figure 1. Exact log Odds Ratio's of 2x2 tables comparing frequency of Cheating between independent samples in each experiment."
Note:
Here, the Confidence Levels have been adjusted to account for the fact that 3 (EXP1&4) and 4 (EXP2&3) subtables of the full design were compared (1(0.05 / #tests)
). The exact pvalue from Fisher’s exact test reported in the Figure was multiplied by the number of comparisons in each experiment.
Conclusion Proportion data
 If there is an effect, it exists as a “maineffect” difference between the
Money
and Time
primed samples in Experiment 1 and 4.
 Experiment 3
No Mirror: Money  Time
is a marginal case.
 Experiment 2 did not yield any substantial effects.
 45 out of 7 statistical inferences in the paper that are made based on proportion data should be considered invalid.
II. Analysis of extent of cheating
The extent of Cheating
concerns the difference between actual accuracy (which is not provided as a result) and reported accuracy by a participant.
Experiment 13 report analyses of extent of Cheating
including means and SD’s. Sample size assumptions for Experiments 2 and 3 are the same as above.
Compare Cohen’s d CIs
I created CIs around the effect sizes based on the means and SD reported for Experiment 13 using the R
package MBESS
.
> [1] "Figure 2. Cohen's d with exact CIs comparing extent of Cheating between independent samples in experiment 13."
Conclusion Extent of Cheating
The pattern is the same as the previous analyses:
 Experiment 1 shows a clear effect between
Money
and Time
samples
 Experiment 3
No Mirror: Money  Time
is again a close call
III. HAPPEing (Hypothesising After PostPublication Evaluation)
Should reviewers have noticed these issues with data analysis?
Yes, they should have!
Even without reanalysing the published data as I have done here, the conclusions by the authors can be questioned based on a comparison of very elementary results:
Across four experiments, using different primes and a variety of measures and tasks, we consistently
found that shifting people’s attention to time decreases dishonesty. Priming time makes people reflect
on who they are, and this selfreflection reduces their likelihood of behaving dishonestly.
The clue is to compare the results across the 4 experiments and evaluate whether it is valid to infer that the core postulates have been corroborated. The designs and materials are slightly different each time, but if variation in outcomes (e.g., proportion cheating behaviour) varies systematically with one or more of the experimental differences, there may be another variable at work here.
One result that begs explanation is the drop in proportion Cheating
in all the samples of Experiment 2 when compared to the other experiments. What is special about the procedure and methods? Regrettably more than 1 potential intervening factor changes with respect to Experiment 1.
A second odd omission in the interpretation of the results is the level of accuracy achieved by participants. In Experiments 13, the urge to cheat must have been less when a participant had achieved 90% accuracy. Experiment 4 is somewhat different in that the cheating opportunity concerns one “bottleneck” problem that is difficult to solve, but has to be correct in order to make other more easily solvable problems count in adding to the final reward. Here, accuracy could have an opposite effect in which less accurate participants cheat less. If 0 or only 1 extra item past the “bottleneck” item were solved, a participant might be less inclined to cheat than a participant who solved every problem except for the “bottleneck” item.
What is mediating what?
The figure below shows the interaction between the maximal financial incentive that could be awarded and the proportion cheating for each prime and experimental condition (indicating whether a mediator variable was manipulated in addition to being exposed to a prime). Note that the Intelligence
and the No Mirror
condition of Experiments 2 and 3 respectively are considered similar to Experiment 1 and 4, that is, they reflect a condition in which Selfreflection
was not induced by any other means than priming:
This relationship can be tested in a generalised linear model, of course being fully aware that this is exploratory HAPPEing. I assume the samples from each experiment are independent and use the number of cheaters vs. no cheaters as the dependent binomial variable. The model contains only those effects for which data are available (e.g., no interactions with both Prime
and Mediator
)
Note:
A generalised linear mixed model (GLMM) with sample ID as a random effect gives similar results.
>
> Call:
> glm(formula = cbind(CheatYES, CheatNO) ~ Reward + Prime + Mediator +
> Reward * Prime + Reward * Mediator, family = binomial, data = reward)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> 1.153 0.695 0.122 0.251 1.956
>
> Coefficients:
> Estimate Std. Error z value Pr(>z)
> (Intercept) 0.4495 0.2195 2.05 0.0405 *
> Reward 0.0111 0.0219 0.51 0.6125
> PrimeNone 0.5868 0.3973 1.48 0.1397
> PrimeMoney 0.6040 0.2824 2.14 0.0325 *
> MediatorSelfreflection 0.8128 0.3147 2.58 0.0098 **
> Reward:PrimeNone 0.0167 0.0359 0.47 0.6416
> Reward:PrimeMoney 0.0698 0.0327 2.13 0.0329 *
> Reward:MediatorSelfreflection 0.0189 0.0434 0.44 0.6626
> 
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
> Null deviance: 76.292 on 13 degrees of freedom
> Residual deviance: 11.035 on 6 degrees of freedom
> AIC: 82.48
>
> Number of Fisher Scoring iterations: 4
> [1] "Nullmodel deviance test: p < 1.33525644154704e11"
In the table above the model Intercept
corresponds to the odds of Cheating
compared to the Nullmodel when the predictors have the values: Prime
= Time
, Mediator
= None
and Reward
= 0. Compared to the overall probability of observing Cheating
behaviour, it thus seems that when the Time
prime is presented without an induction of Selfreflection
and a financial reward incentive, the odds of Cheating
drop.
This appears to be a corroboration of the second postulate, but note that in this analysis (just as in the previous analyses), there is no real difference between the Time
prime and prime = None
. The standard errors around these parameters are quite high. A clearer picture emerges when the Intercept is defined as Prime
= None
, Mediator
= None
and Reward
= 0 and the Odds Ratios are compared (exponentiation of the parameter estimates):
> [1] "Odds Ratios compared to Prime = None, with profile likelihood CI.95"
> OR 2.5 % 97.5 %
> (Intercept) 1.15 0.60 2.21
> Reward 1.03 0.97 1.09
> PrimeTime 0.56 0.25 1.21
> PrimeMoney 1.02 0.47 2.21
> MediatorSelfreflection 0.44 0.24 0.81
> Reward:PrimeTime 0.98 0.92 1.05
> Reward:PrimeMoney 1.05 0.98 1.14
> Reward:MediatorSelfreflection 0.98 0.90 1.07
The odds ratios in the table above are multiplicative changes to the Probability of Cheating
= 1 when the predictor increases by 1 unit. So an OR < 1 will decrease the odds of observing Cheating
behaviour and an OR > 1 will increase it. The 95% CIs are based on the profile likelihood and show that in most cases the effect covers a range below and above 1. The range for the effect of SelfReflection
is always below 1.
One can interpret the modelled relationship between these variables as follows:
 There is a weak positive association between the
Maximal Financial Reward
and the Probability of Cheating
 The association changes with the value of
Prime
, becoming stronger when Money
is primed, weaker when Time
is primed
 The induction of
Selfreflection
does not cause the association to change, it changes the intercept, the baseline Probability of Cheating
at Reward
= 0
A graphical representation of the model predictions more clearly reveals this relationship:
Conclusions, Discussion and further HAPPEing
 The significant results between
Time
and Money
in Experiments 1 and 4 probably arise due to the increase in Probability of Cheating
when there is a financial reward and Money
is primed.
 It is unlikely there are any other “real” differences in these data except for the induction of
Selfreflection
: Model predictions show it decreases the Probability of Cheating
by the same amount for different primes
 Note that there were no actual data points for
None
+ Selfreflection
 The missing predictors in the
Probability of Cheating
analysis are the actual and reported accuracy of the performance (amount of correctly solved problems and money received respectively). These values cannot be inferred from the extent of cheating analyses. It seems reasonable to assume in most experiments there was less incentive to engage in Cheating
by participants who were more accurate.
 This brings up the question of whether the effects are driven by some sort of SpeedAccuracy instruction: Naturally,
Time = Money
, but taking the time to solve the problems may lead to higher accuracy and less incentive to cheat, likewise a focus on getting as many answers as possible may introduce errors and promote cheating.
In science there is a moral obligation to do the best one can to be as accurate as possible and usually this means it is wise to be as modest as possible about ones’ scientific claims. I am not an expert in this field, but the sheer amount of questions that can be raised about the validity of the inferences made in this paper makes one wonder who the peers were that achieved consensus about the credibility of this research and what their area of expertise was.
I am not saying this is irrelevant, or poor research; the two effects that survive the scrutiny of 3PR are certainly interesting. I am just a little worried this paper says more about the morality of contemporary scientific publishing than the scientific study of moral behaviour.
Some notes about this file:
 This file was created using Markdown in RStudio: Unless otherwise indicated in the code blocks (e.g., by require), the basic R packages are used.
 All the analyses are based on results reported in the publication.
 The one true gospel on statistical inference does not exist and more than one approach to analyse these data may be defensible.
 Therefore: Please be aware these comments and suggestions reflect my own preferences and standards in these matters. If you feel I should change some of my preferences and/or standards please let me know, because I review and adjust them on a regular basis.