2026-06-05 | Branch: validation-analysis
Two traditions represented: formative instrument papers (Chandler, Cacciotti, Spieth21) and composite/index tradition (Bloom & Van Reenen, Furman et al.).
| Criterion | Chandler JBV '11 |
Cacciotti JBV '20 |
Spieth21 JMS '21 |
B&VR QJE '07 |
Furman RP '02 |
SDM |
|---|---|---|---|---|---|---|
| TIER 1 — MINIMUM | ||||||
| 1. Theoretical non-interchangeability | ● | ● | ● | ● | ● | ● Theory grounded |
| 2. Content validity | ● | ● | ● | ● | ◑ | ● 40-item pool from literature + 12 manager interviews + LLM |
| 3. VIF / non-redundancy | ○ | ◑ | ● | ○ | ○ | ● ★ VIF 1.02–1.23 across all 24 items |
| 4. CFA / model diagnostic | ● | ● | ● | ○ | ○ | ● 3-factor reflective CFA: CFI=.418, TLI=.355, RMSEA=.054. Framed correctly as expected misspecification |
| TIER 2 — GOOD MANAGEMENT PAPER | ||||||
| 5. Experimental sensitivity | ○ | ○ | ○ | ○ | ○ | ● ★★ η²=.43, p<.001 |
| 6. Robustness alt weights | ○ | ○ | ○ | ● | ● | ● App A5.2: equal item weights, equal dimension weights, minimum operator |
| 7. Within-study criterion | ● | ● | ● | ○ | ○ | ◑ Vignette r=.19***, N=2,745. But same-team, same-occasion / within-step r=.052 > between-step r=.023 (structural coherence) |
| 8. Nomological validity | ● | ● | ● | ● | ● | ◑ AL Tier 1 b=0.16*** with firm FE, wild bootstrap, Holm correction. No only one possible explanation |
| TIER 3 — TOP JOURNALS | ||||||
| 9. External criterion | ○ | ○ | ○ | ● | ● | ○ |
| 10. Convergent w/ est. scales | ○ | ● | ○ | ○ | ○ | ○ |
We made a thermometer, we need not another thermometer to validate ours but boiling water to tell us that the thermometer is indeed signaling 100 degrees
At this stage, we do not have a completed boiling-water solution with the data currently available. This is defensible for RP Research Note, since no benchmark paper in the formative instrument tradition has boiling water either.
Note: Prolific can also mean small new data gathering with students. If students are used, monetary cost may drop to $0; timing would depend on access and logistics.
| Option | Temperature | Cost (rough approx.) | Time (rough approx.) | What it adds | Feasibility |
|---|---|---|---|---|---|
| 1. Camuffo RCT assessment protocol on Prolific / students | Hot | $0-300 | 3-6 weeks | Criterion validity: independent protocol designed by different team scores same construct | doable |
| 2. Prolific / students + external decision task | Warm-hot | $0-300 | 4-5 weeks | Quasi-criterion: someone external to the team designs another vignette-like measure | doable, but the scenarios must NOT be designed by our team. Could adapt from Bazerman & Moore (2013) or Kahneman-style decision cases |
| 3. Prolific / students AOT/NFC only | Warm | $0-250 | 3-4 weeks | Measures other constructs close to SDM, for example AOT, NFC, among others | doable |
| 4. All three in one Prolific / student session | Warm-to-hot | $0-350 | 4-5 weeks | Convergent measures + best available criterion (Camuffo protocol OR external decision task) | Best - one small data gathering exercise, ~20 min, covers Row 9 + Row 10 |
* "Best available criterion" = Camuffo RCT assessment protocol or external vignette-like decision task designed by a non-team member. Prolific can be replaced by students as a small new data gathering exercise. NOT boiling water — but the hottest water any comparable instrument paper has achieved.
** ORM also requires: respond to Edwards (2011). True boiling water (behavioral traces) is unrealistic for any first instrument paper. ORM may accept hot water + honest limitations.