HTE Paper — Analysis Map

The story we are building

Average treatment effect on performance = 0 (β₁ not significant in any outcome)

↓ but

Massive heterogeneity: β₂ = 0.79–0.89*** across all performance outcomes

↓ so who gains?

The "experienced practitioner" type: serial entrepreneur, +6yr work exp, older, less formally educated

↓ and why?

Same type absorbs more SI — suggesting a mechanism: theory works when the founder can implement it

↓ is it robust?

Must survive multiple GATES specs + pre-attrition sample + OLS triangulation — then it enters the paper

Full analysis map (mindmap)

mindmap root((**HTE Paper**
For whom does SDM work?)) **✅ BASE — Confirmed results** Performance track ATE null β₁ ns all outcomes HTE massive β₂ 0.79–0.89*** GATES G1=-0.83 · G2=-0.04 · G3=+1.35 CLAN top-5 all p<0.001 startup_founded +68pp work_exp_year +6.3yr managerial_exp +3.8yr age +5.8yr education -25pp SI absorption track ATE +1.70*** si_emp vs C HTE significant β₂=0.53–0.67*** CLAN vs C: experienced + has product CLAN vs E: educated + first venture Comparison 09_clan_comparison.R done 5 vars SAME direction education OPPOSITE — interesting tension **🔁 AXES TO VARY** A · Sample P1–P5 ✅ main P1–P2 PENDING pre-attrition P1 only PENDING cross-section B · Outcome Y log_sales ✅ any_sales ✅ log_emp ✅ log_sales_per_emp ✅ tfp ✅ si_theory_stock ✅ si_emp_stock ✅ si_thhp_stock PENDING C · Comparison TE vs Control ✅ TE vs EB ✅ D · GATES quantile spec 2-groups ✅ 3-groups ✅ main 4-groups ✅ Top20 vs Bot20 PENDING **🔲 COMPLEMENTARY** OLS parametric triangulation D × startup_founded PENDING D × work_exp_year PENDING D × college PENDING D × managerial_exp PENDING D × practitioner_type PENDING Types analysis Define practitioner type ATE by type Type distribution by site Site HTE exploratory Already in Z_i site dummies ✅ ATE by site PENDING Composition by site PENDING **📋 PERSISTENCE CRITERION** Survives ≥3 GATES specs Significant in ≥2 outcomes Holds in P1–P2 sample Confirmed by OLS Then it enters the paper **📄 PAPER OUTPUT** §3 BLP table main spec §3 GATES figure G1-G2-G3 §3 CLAN portrait of type §4 Mechanism via SI §5 Robustness tables Appendix site analysis

The 4 axes of variation — what we can turn

All core analyses run the same GenericML pipeline. A finding becomes a paper result only if it persists when we change these 4 knobs.

A · Sample (which periods?)

P1–P5 (full panel) done

P1–P2 (pre-attrition) pending

P1 only (cross-section) pending

D-003: attrition differential starts P3 (Control 73% vs TE/EB 37%)

B · Outcome Y

log_sales, any_sales, log_emp, log_sales_per_emp, tfp done

si_theory_stock, si_emp_stock done

si_thhp_stock pending

log_sales winsorized robustness

C · Comparison arm

TE vs Control done

TE vs EB done

Both comparisons embedded in 07 and 08 scripts

D · GATES quantile spec

2-groups (median split) done

3-groups (terciles) ← main done

4-groups (quartiles) done

Top-20% vs Bottom-20% pending

Complementary analyses (beyond GenericML)

done completed

pending next to run

explore exploratory / appendix

ID	Analysis	What it answers	Status	Priority
OLS Parametric Triangulation
P-01	`Y ~ D × startup_founded + i.rct + i.period`	Confirms CLAN pattern with interpretable coefficient	pending	High
P-02	`Y ~ D × work_exp_year + controls`	Confirms experience pattern (continuous)	pending	High
P-03	`Y ~ D × college + controls`	Confirms education paradox	pending	High
P-04	`Y ~ D × managerial_exp + controls`	Managerial vs general experience	pending	Medium
P-05	`Y ~ D × practitioner_type + controls`	Composite index: does the "type" have a single effect?	pending	High
Types Analysis — "Reason about types" (Arnaldo)
T-01	Define practitioner type: startup_founded=1 AND work_exp > median AND college=0	Is there a coherent "type" in the data?	pending	High
T-02	Simple ATE by type: E[Y\|D=1,type=1] − E[Y\|D=0,type=1]	How big is the gain for the practitioner type?	pending	High
T-03	Distribution of types by RCT site	Does site composition explain country-level differences?	pending	Medium
Site-Level HTE — exploratory (CGJ 2025 shows significant site variation)
S-01	Site dummies already in Z_i — check if rct appears in top CLAN variables	Does site membership predict benefit group?	implicit	Check outputs
S-02	ATE by site: `Y ~ D + D×i.rct + controls`	Is the ATE null uniform across sites? (CGJ: Colombia > India)	explore	Medium
S-03	Type distribution by site — does practitioner type cluster by country?	Compositional explanation for site-level differences	explore	After T-01
Mechanism
M-01	CLAN comparison: performance vs SI absorption (09_clan_comparison.R)	Do the same people benefit in both tracks?	done	—
M-04	Narrative mechanism: same type in both tracks → same story	Informal mediation without IV: theory works when implementable	pending	High (paper §4)

Persistence criterion — what enters the paper (D-016)

A finding enters the paper if it passes all 4 filters:

GATES robustness: appears in ≥3 quantile specs (2g / 3g / 4g)

Outcome breadth: significant in ≥2 outcomes of the same track

Sample robustness: survives when restricted to P1–P2 (pre-attrition)

Parametric confirmation: confirmed with OLS interaction (interpretable β)

Target: 2–3 robust patterns. Not a boilerplate list of effects.

Current candidates: startup_founded · work_experience_year · managerial_exp · age · education/college (all *** in ≥4 performance outcomes)

What each analysis feeds into the paper

§3 — Main Results

BLP table

07_performance_hte.R + 08_si_hte.R
β₁ (ATE) and β₂ (HTE test) all outcomes × comparisons

§3 — Main Results

GATES figure (3g)

G1 / G2 / G3 bars for log_sales
Shows who gains and who doesn't

§3 — Main Results

CLAN — portrait of the type

Top 3 persistent CLAN vars
The "experienced practitioner" profile

§3 — Robustness in-text

OLS parametric table

11_ols_interactions.do
D × var_CLAN coefficients, interpretable magnitudes

§3 — Types

ATE by practitioner type

12_types_analysis.R
"Type gains X log_sales; non-type gains 0"

§4 — Mechanism

SI mechanism narrative

09_clan_comparison.R
Same type absorbs more SI → can implement theory

§5 — Robustness

P1–P2 robustness

10_robustness_p1p2.R
CLAN patterns survive pre-attrition sample

Appendix

Site-level HTE

13_site_hte.R (if warranted)
Composition of types by country

Execution order

#	Task	Script	Block
1	Verify `i.rct` in 05_ols_interaction_robustness.do	quick check	A — robustness
2	Rerun 07+08 with P1–P2 only	10_robustness_p1p2.R	A — robustness
3	Add top-20%/bottom-20% to 07 and 08	edit 07_performance_hte.R + 08_si_hte.R	A — robustness
4	OLS interaction table (top-5 CLAN vars, all outcomes, with i.rct)	11_ols_interactions.do	A — robustness
5	Define practitioner type + ATE by type	12_types_analysis.R	B — types
6	Site-level ATE + type distribution by site	12_types_analysis.R (add section)	B — types
7	Add si_thhp_stock to 08_si_hte.R	edit 08_si_hte.R	C — completeness
8	Causal Forest (optional, pre-submission)	new script	D — final robustness

Note on rct / site controls

In GenericML (07 and 08): site dummies are in Z_i (lines 38–44 of 07_performance_hte.R). Y is residualized on period FE only — not site FE. This is deliberate: absorbing site variation in Y would prevent studying cross-site heterogeneity (CGJ 2025, p.66 explicit note). Site effects are absorbed within the ML nuisance models.

In OLS triangulation (Stata): i.rct must be included in all interaction specs. Verify in 05_ols_interaction_robustness.do before running 11_ols_interactions.do.

CGJ 2025 reference: finds significant site heterogeneity (p=0.001 in CLAN). Colombia and Italy show highest effects; India and China lowest. Check if this aligns with the practitioner-type composition by site once T-01 is done.