How it works — 7 steps

Two AI models, run as two decorrelated sleeves (up to ~6 S&P 500 names). Each sleeve re-picks when none of its names is still in its top-K pool. Cash on crash.

Universe. Use the actual S&P 500 list at the rebalance date — never today's list. ~500 stocks per month. The historical panel is the augmented PIT S&P 500 dataset (1994 tickers, including 161 acquired/renamed large-caps backfilled from FNSPID + yfinance with date-overlap validation against PIT membership). The live current universe is loaded from iShares Core S&P 500 ETF (IVV) holdings.
Features. 79 price-only signals per stock (momentum, trend, recovery, RSI, volatility, drawdowns, cross-sectional rank trajectories, Mahalanobis distance to a pre-runner archetype, etc.) ranked relative to peers each month.
Primary model — walk-forward GBM. Gradient Boosted Trees, retrained every January with a 7-month embargo, produce 1-, 3- and 6-month forward-return rank predictions. Two scorers are derived: ml_3plus6 (mean of the 3m & 6m rank) and a 50/50 blend of the multi-horizon-consensus rank with ml_3plus6. The strategy runs two sleeves that swap which scorer drives stock selection vs rebalance timing — see the "Why two decorrelated sleeves (E2)" note below (deployed May 2026).
Confirmation model. HuggingFace Chronos-bolt-tiny zero-shot foundation model (9M parameters). Forecasts 3-month price-distribution for each stock from its 252-day daily price history. Filter: require Chronos p70 cross-sectional rank ≥ 0.45 — eliminates the bottom 45% by Chronos confidence.
Crash gate. SPY drops 8% in 21 days, or 5% over 6 months with bad recent action → hold cash next month.
Sleeve A picks the top 2; Sleeve B picks the top 2–3 (conviction-adaptive — widens to 3 when scores are bunched) by its Gradient Boosted score from the Chronos-filtered pool, inverse-volatility weighted with a 40% cap per pick. The two sleeves are held 50/50, so the combined book is up to ~6 names.
Hold & check. Hold the basket for at least 6 months. After that, re-evaluate every month: if any current pick is still in the new top-K eligible pool, that sleeve keeps its basket; only swap when none is. Force rebalance at 24 months max. ~1.0-1.5× annual turnover.

Walk-forward validation (10 splits, OOS test)

10 TRAIN/TEST splits covering 2003-2026 — three expanding-window (A1/A2/A3), six 3y rolling (R1-R6) including the GFC, COVID, 2022 bear and 2023-24 AI rally, plus one strict last-block holdout (STRICT). The model is fit only on TRAIN data with a 7-month embargo gap.

The honest DCA reading is in the Walk-forward section above (money-weighted return vs S&P-DCA per window). The lump-sum "10/10 beat SPY" figure is engine evidence that the picker is real and not overfit — it survives the GFC, COVID, the 2022 bear and the 2023-24 AI rally out-of-sample — but a monthly contributor does not experience per-window lump-sum CAGR, so we do not headline it. Full breakdown in the IMPROVEMENTS.md research report.

Why rule-based rebalance instead of fixed 6-month?

Timing-luck mitigation. The deployed v5 used to rebalance mechanically every 6 months — twice a year. That schedule was exposed to rebalance-date luck: if the two annual entry dates happened to land on bad picking moments, the year underperformed even when the picker's average quality was fine. 2024 was the worst case: Jan-31 and Jul-31 lump-sum entries gave -25pp edge vs SPY for the year, even though the other 10 monthly entry dates averaged +5.7pp positive.

The rule-based rebalance fixes this. We hold each basket for at least 6 months (preserving the GBM's 3-6m prediction horizon), then re-evaluate every month-end. We rebalance EARLIER only when the picker has discovered that none of a sleeve's current picks is still in its top-K eligible pool. If at least one still qualifies, that sleeve keeps holding.

The rule-based schedule beats the old fixed-6-month one on rebalance-timing robustness (it materially fixed the 2024 timing-luck problem) at lower turnover (~18 early rebalances across 25 baskets vs 38 fixed). Honesty note: an earlier simplified sweep reported a flattering Sharpe 1.10 / −34.5% drawdown for this rule from a NaN-favorable sim. The canonical reference — and the DCA numbers throughout this page — is the honest one: deep interim drawdowns (~−77% lump-sum picker, ~−56% on the accumulating monthly-DCA portfolio under E2). The rule improves when we rotate; it does not remove the drawdown. Pure quarterly rotation was also tested and rejected (too frequent for the 3-6m signal horizon).

Why K = 2 per sleeve (and Sleeve B's adaptive 2→3)?

The augmented-PIT parameter sweep settled the base-K question. Within the S&P 500 cohort the cross-sectional alpha is concentrated at the top of the score distribution. K=15 dilutes alpha with mid-percentile names; K=3 (an earlier deployment) was tuned on the biased v2 panel and is sub-optimal once the universe is PIT-corrected. K=1 maximises the edge but the drawdown blows out (single-name fragility). K=2 is the per-sleeve sweet spot.

E2 runs two such sleeves. Sleeve A (WIN1) always holds the top 2. Sleeve B is conviction-adaptive: it holds 2 when the cross-sectional score gap signals a genuine standout, and widens to 3 when the top scores are bunched (the picker is effectively guessing — historically the weaker years), so one low-conviction name can't dominate. Held 50/50, the combined live book is up to ~6 names. This adaptive breadth is one of the two levers that make E2 a strict improvement on the prior single-deploy E1 (validated in IMPROVE_PICK_RCD_FINDINGS.md).

K=2 is chosen for the strength of the DCA edge, not for low drawdown — drawdowns are deep at every K. The honest E2 figure is ~−77% on the underlying lump-sum picker / ~−56% on the accumulating monthly-DCA portfolio (disclosed in the DCA section above), and that deep interim drawdown is the price of the upside. K=2 also dominates K=3 across every level of the Monte-Carlo synthetic-delisting overlay (α 0–20%/yr): fewer picks means fewer monthly delist exposures while the picker edge per surviving pick dominates the loss per delist event. See IMPROVEMENTS.md.

Survivorship-bias overlay (Monte Carlo)

Even on the augmented PIT panel (1994 tickers, 72% coverage in 2003 → 99.7% in 2025), there are still ~213 historical OTC bankruptcy-Q tickers (AAMRQ, LEHMQ, WAMUQ, ANRZQ, etc.) that free data cannot reach. We run a Monte Carlo overlay where each pick has an independent per-month probability of being synthetically wiped to -100% at hazard rate α.

Across hazard rates the picker keeps beating the S&P up to a high synthetic-delisting rate (well beyond the realistic ~2-4%/yr large-cap rate), and K=2 stays more robust than K=3 at every level tested — fewer picks means fewer monthly delist exposures. The per-α robustness verdicts are in the "Survivorship stress" section below; this is engine robustness, not the DCA return.

Augmented PIT S&P 500 dataset

The historical universe was reconstructed to fill the gap that the original v2 panel left: it was missing 374 of the 985 historical PIT S&P 500 tickers (only 51% coverage in 2003), mostly acquired or bankrupt names whose tickers Yahoo retired. The augmented panel adds 161 backfilled tickers — 108 from the FNSPID Hugging Face dataset (CC BY-NC) and 53 from yfinance — including AGN (Allergan), ANTM (Anthem), ABMD (Abiomed), ALXN (Alexion), ATVI (Activision), BHGE→BKR (Baker Hughes), RTN→RTX (Raytheon), SYMC→GEN (Symantec), and others. Every backfill is validated by date-overlap against PIT membership to filter ticker-reuse traps.

Coverage lifts to 72% in 2003 → 99.7% in 2025. Full methodology and per-year coverage in the dataset README and the PIT validation report.

2024 timing-luck and the staggered alternative

The 6-month rebalance schedule means 2 entries per year — exposed to "rebalance-date luck". 2024 was the worst case at K=3: Jan-31 and Jul-31 entries landed on the year's two worst picking moments (-25 pp edge vs SPY), even though the other 10 monthly entry dates averaged +5.7 pp positive. Diagnosed in TIMING_LUCK.md.

Moving to K=2 (and now the E2 two-sleeve design) reduces 2024's edge gap from -25 pp to -10.2 pp just by being more selective. A more aggressive alternative — crash-aware 6-tranche staggered DCA — was studied and offers further timing-luck mitigation, but at the cost of ~3 pp WF mean. Not deployed; documented for the curious.

The HuggingFace Chronos foundation model

The v5 strategy adds a confirmation signal from Amazon Chronos-bolt-tiny — a 9M-parameter zero-shot time-series foundation model from HuggingFace, trained on millions of generic time series. Given a stock's trailing 252-day daily prices, it produces a probabilistic 64-day forecast distribution. We take the 70th-percentile forecast (Chronos's confidence that the stock will be in the top end of its forward distribution) and cross-sectionally rank it within the S&P 500 cohort each month.

The strategy only considers stocks where the Chronos p70 rank is ≥ 0.45 (i.e., the upper 55% by Chronos confidence). From that filtered pool, Sleeve A picks its top 2 and Sleeve B its top 2–3 (conviction-adaptive). The two AI models capture different aspects of the cross-section — Gradient Boosted Trees on tabular features, Chronos on raw price-shape dynamics — and their combination is more robust than either alone.

Why two decorrelated sleeves (E2, deployed May 2026)?

Stock selection and rebalance timing are two separate decisions, and the GBM gives two useful scorers: ml_3plus6 (mean of the 3m & 6m forward-rank) and a 50/50 blend of the multi-horizon-consensus rank with ml_3plus6. Sleeve A (WIN1) selects with ml_3plus6 and uses the blend as its drift trigger. Sleeve B (RC D + adaptive breadth) selects with the blend (triggering on ml_3plus6) but with two extra, independently-validated picking levers: the blend weight is regime-conditional — momentum-leaning in a confirmed bull, consensus-stable in normal/recovery — and the basket breadth is conviction-adaptive, holding 2 names when the cross-sectional score gap signals genuine conviction and widening to 3 when scores are bunched. It is the same underlying alpha; the two sleeves rebalance on different dates and hold different names, so their idiosyncratic noise partly cancels (E1's free-consistency lever) while Sleeve B's regime-timed blend adds forward return and adaptive breadth adds year-to-year consistency — orthogonal levers that stack. We hold them 50/50 (the combined book is up to ~6 names). This is not a new alpha source — it is diversification plus regime/conviction-aware sizing of one edge.

E2 cleared the full production-harness gauntlet (Chronos + inv-vol + regime gate + rule-based rebalance + canonical walk-forward + Monte-Carlo delisting overlay) and is a strict Pareto improvement on the prior single-deploy E1: full CAGR 51.9% → 56.6%, Sharpe 1.03 → 1.10, max interim drawdown −56% → −56% (unchanged — the 2008 GFC floor), walk-forward splits beating SPY 10/10, the worst rolling 5-year DCA outcome +11.7%/yr → +13.6%/yr, the forward CAGR excluding the front-loaded 2003–09 era 33.4% → 38.2%, and it beats S&P-DCA in all four non-overlapping eras (slimmest: recent 2021–2025).

Honest read: every headline metric improves and the rolling DCA win rate is a literal 100% at 10y (159/159), 99% at 5y, 96% at 3y. It is robust where it matters against overfitting: cost-insensitive (identical 0–30 bps), a wide 50/50 mix-weight plateau (0.3–0.7), more delisting-robust than E1, and the strongest truly-out-of-sample holdout of any variant (untouched 2013–2026: Sharpe ~1.24 vs E1's 1.08). The −56% drawdown does not move — it is the 2008 GFC systemic event, which a stock-picking lever cannot reshape; E2 raises return and consistency at E1's drawdown, it does not make the strategy low-risk. The edge magnitude is still hugely front-loaded in 2003–2009 and the interim drawdowns are still deep (~-56% on the accumulating account, ~-77% on the lump-sum picker) — E2 narrows the dispersion, it does not remove the risk. Full detail and the overfit gauntlet in IMPROVE_PICK_RCE1_FINDINGS.md.

Other strategies & full research

Full v5 research, the augmented PIT dataset, the parameter sweeps (K, scorer, Chronos quantile, hold, cap, regime gate), the Monte-Carlo delisting overlay, and the reproduction guide all live in the experiments/monthly_dca/v5/spx_pit/ directory of the GitHub repo. The v2/v3/v4 baselines remain in their respective directories for comparison.

Survivorship stress — does the edge survive phantom failures?

The universe is already true point-in-time S&P 500 (delisted names are eligible while listed, then removed). This goes further: a Monte-Carlo overlay wipes random picks to −100% at rising hazard rates to test whether the picker's edge over the S&P survives if the price panel quietly missed any failures. Shown as a robustness verdict (does it still beat the S&P?), not a return number — the DCA outcome is the section at the top.

Edge survives delisting up to

≤ —%/yr

Largest synthetic annual delisting hazard at which the picker still beats the S&P. The realistic large-cap rate is ~2-4%/yr.

At α=4%/yr (realistic)

—

Historical large/mid-cap delisting rate. Robustness verdict for the picker.

At α=8%/yr (2× historical)

—

Double the historical rate — a deliberately harsh stress.

At α=20%/yr (apocalypse)

—

Every pick has a 20%/yr phantom-wipe probability — far beyond any real regime. The breakage stress.

Robustness verdict at eight annual delisting rates

Pretend each pick has an annual probability α of being silently wiped to −100% (a synthetic delisting). Does the picker still beat the S&P? This is engine robustness — the investor outcome is the DCA section at the top.

Stress level	Robustness verdict

Why true PIT membership matters

The naive backtest uses today's S&P 500 list back-applied to all dates — but today's list is curated by S&P to remove failures. Companies like Lehman Brothers (S&P 500 1986-2008), Bear Stearns (1998-2008), AIG (1980-2008), GM (1925-2009), Washington Mutual (2002-2008), SVB (2009-2023), Bed Bath & Beyond (2003-2022), Sears (1956-2018) were all S&P 500 constituents during the periods they were collapsing — but they're absent from today's list. Restricting picks to today's list eliminates them automatically, gifting the backtest an enormous artificial edge.

The eligible universe is built from 2,595 daily snapshots of the S&P 500 from 1996 to 2019, plus 110 explicit add/remove change events from 2019 onward. 976 unique tickers were S&P 500 constituents at some point; ~500 per month. The eligible pool at, say, September 2008 is the actual September 2008 S&P 500 — Lehman Brothers is eligible to be picked, then auto-removed in October 2008 when it actually delisted. No hindsight.

Two different "win rates" — which 100% is real and which is impossible

The impossible one: a ~100% monthly hit rate, or a 100% lump-sum win rate, with parabolic upside and no downside. That does not exist on this data and we will not claim it. The cross-sectional information coefficient (IC) within the S&P 500 cohort is ~0.04 for the GBM plus ~0.02 from the Chronos filter — small but real, and at the upper end of the ~0.06 literature ceiling for a price-only large-cap signal. Any strategy with finite IC has a ceiling; the monthly hit rate here is ~58% (barely above a coin-flip) and short-horizon drawdowns reach ~-77% on the lump-sum picker. We separately built and honestly killed multiple attempts to beat this (a Chronos-distribution downside model, conviction-adaptive concentration, vol-targeting and drawdown-breaker overlays) — they failed or bled too much return, confirming there is essentially one independent alpha here.

The real one: a rolling-10-year DCA-vs-S&P-DCA win rate of 100%. These are completely different statistics. The first asks "is any single month/lump-sum a winner?" (no, not reliably). The second asks "if I contribute every month for 10 years, do I end up ahead of the same contributions into the S&P 500?" — and on PIT data 2003–2026 the answer was yes in 159 of 159 rolling 10-year windows (a literal 100% under the deployed E2 two-sleeve strategy; the worst such 10-year window still grew the money nearly 6×, and 99% of 5-year windows also won). A low monthly hit rate and a high multi-year DCA win rate are mathematically consistent: the big winners dominate the terminal portfolio while steady monthly contributions average through the drawdowns. The honest claim is the second one, with its multi-year-commitment, front-loading and interim-drawdown caveats stated plainly — never the first.

That the underlying picker stays positive out-of-sample across the GFC, COVID, the 2022 bear AND the 2023-24 AI rally is the strongest non-overfit evidence the engine is real — it is what makes the long-horizon DCA edge trustworthy rather than a backtest artifact.

A monthly DCA strategy that beat S&P-into-S&P in 100% of 10-year windows.

If you dollar-cost-average monthly — the honest outcome

How it works — 7 steps

Your DCA account — $1 contributed every month since 2003

If you'd been DCA-ing for X years

Year-by-year (DCA through each year)

Walk-forward — 10 out-of-sample windows, in DCA terms

The picks that drove your DCA account

Historical monthly baskets

Every trade since inception

Robustness — the DCA edge and the engine behind it

Survivorship stress — does the edge survive phantom failures?