The story we are building
Full analysis map (mindmap)
For whom does SDM work?)) **✅ BASE — Confirmed results** Performance track ATE null β₁ ns all outcomes HTE massive β₂ 0.79–0.89*** GATES G1=-0.83 · G2=-0.04 · G3=+1.35 CLAN top-5 all p<0.001 startup_founded +68pp work_exp_year +6.3yr managerial_exp +3.8yr age +5.8yr education -25pp SI absorption track ATE +1.70*** si_emp vs C HTE significant β₂=0.53–0.67*** CLAN vs C: experienced + has product CLAN vs E: educated + first venture Comparison 09_clan_comparison.R done 5 vars SAME direction education OPPOSITE — interesting tension **🔁 AXES TO VARY** A · Sample P1–P5 ✅ main P1–P2 PENDING pre-attrition P1 only PENDING cross-section B · Outcome Y log_sales ✅ any_sales ✅ log_emp ✅ log_sales_per_emp ✅ tfp ✅ si_theory_stock ✅ si_emp_stock ✅ si_thhp_stock PENDING C · Comparison TE vs Control ✅ TE vs EB ✅ D · GATES quantile spec 2-groups ✅ 3-groups ✅ main 4-groups ✅ Top20 vs Bot20 PENDING **🔲 COMPLEMENTARY** OLS parametric triangulation D × startup_founded PENDING D × work_exp_year PENDING D × college PENDING D × managerial_exp PENDING D × practitioner_type PENDING Types analysis Define practitioner type ATE by type Type distribution by site Site HTE exploratory Already in Z_i site dummies ✅ ATE by site PENDING Composition by site PENDING **📋 PERSISTENCE CRITERION** Survives ≥3 GATES specs Significant in ≥2 outcomes Holds in P1–P2 sample Confirmed by OLS Then it enters the paper **📄 PAPER OUTPUT** §3 BLP table main spec §3 GATES figure G1-G2-G3 §3 CLAN portrait of type §4 Mechanism via SI §5 Robustness tables Appendix site analysis
The 4 axes of variation — what we can turn
All core analyses run the same GenericML pipeline. A finding becomes a paper result only if it persists when we change these 4 knobs.
Complementary analyses (beyond GenericML)
| ID | Analysis | What it answers | Status | Priority |
|---|---|---|---|---|
| OLS Parametric Triangulation | ||||
| P-01 | Y ~ D × startup_founded + i.rct + i.period |
Confirms CLAN pattern with interpretable coefficient | pending | High |
| P-02 | Y ~ D × work_exp_year + controls |
Confirms experience pattern (continuous) | pending | High |
| P-03 | Y ~ D × college + controls |
Confirms education paradox | pending | High |
| P-04 | Y ~ D × managerial_exp + controls |
Managerial vs general experience | pending | Medium |
| P-05 | Y ~ D × practitioner_type + controls |
Composite index: does the "type" have a single effect? | pending | High |
| Types Analysis — "Reason about types" (Arnaldo) | ||||
| T-01 | Define practitioner type: startup_founded=1 AND work_exp > median AND college=0 | Is there a coherent "type" in the data? | pending | High |
| T-02 | Simple ATE by type: E[Y|D=1,type=1] − E[Y|D=0,type=1] | How big is the gain for the practitioner type? | pending | High |
| T-03 | Distribution of types by RCT site | Does site composition explain country-level differences? | pending | Medium |
| Site-Level HTE — exploratory (CGJ 2025 shows significant site variation) | ||||
| S-01 | Site dummies already in Z_i — check if rct appears in top CLAN variables | Does site membership predict benefit group? | implicit | Check outputs |
| S-02 | ATE by site: Y ~ D + D×i.rct + controls |
Is the ATE null uniform across sites? (CGJ: Colombia > India) | explore | Medium |
| S-03 | Type distribution by site — does practitioner type cluster by country? | Compositional explanation for site-level differences | explore | After T-01 |
| Mechanism | ||||
| M-01 | CLAN comparison: performance vs SI absorption (09_clan_comparison.R) | Do the same people benefit in both tracks? | done | — |
| M-04 | Narrative mechanism: same type in both tracks → same story | Informal mediation without IV: theory works when implementable | pending | High (paper §4) |
Persistence criterion — what enters the paper (D-016)
A finding enters the paper if it passes all 4 filters:
Target: 2–3 robust patterns. Not a boilerplate list of effects.
Current candidates: startup_founded · work_experience_year · managerial_exp · age · education/college (all *** in ≥4 performance outcomes)
What each analysis feeds into the paper
β₁ (ATE) and β₂ (HTE test) all outcomes × comparisons
Shows who gains and who doesn't
The "experienced practitioner" profile
D × var_CLAN coefficients, interpretable magnitudes
"Type gains X log_sales; non-type gains 0"
Same type absorbs more SI → can implement theory
CLAN patterns survive pre-attrition sample
Composition of types by country
Execution order
| # | Task | Script | Block |
|---|---|---|---|
| 1 | Verify i.rct in 05_ols_interaction_robustness.do | quick check | A — robustness |
| 2 | Rerun 07+08 with P1–P2 only | 10_robustness_p1p2.R | A — robustness |
| 3 | Add top-20%/bottom-20% to 07 and 08 | edit 07_performance_hte.R + 08_si_hte.R | A — robustness |
| 4 | OLS interaction table (top-5 CLAN vars, all outcomes, with i.rct) | 11_ols_interactions.do | A — robustness |
| 5 | Define practitioner type + ATE by type | 12_types_analysis.R | B — types |
| 6 | Site-level ATE + type distribution by site | 12_types_analysis.R (add section) | B — types |
| 7 | Add si_thhp_stock to 08_si_hte.R | edit 08_si_hte.R | C — completeness |
| 8 | Causal Forest (optional, pre-submission) | new script | D — final robustness |
Note on rct / site controls
In GenericML (07 and 08): site dummies are in Z_i (lines 38–44 of 07_performance_hte.R). Y is residualized on period FE only — not site FE. This is deliberate: absorbing site variation in Y would prevent studying cross-site heterogeneity (CGJ 2025, p.66 explicit note). Site effects are absorbed within the ML nuisance models.
In OLS triangulation (Stata): i.rct must be included in all interaction specs.
Verify in 05_ols_interaction_robustness.do before running 11_ols_interactions.do.
CGJ 2025 reference: finds significant site heterogeneity (p=0.001 in CLAN). Colombia and Italy show highest effects; India and China lowest. Check if this aligns with the practitioner-type composition by site once T-01 is done.