A monthly DCA strategy that beat S&P-into-S&P in 100% of 10-year windows.

This is built for people who invest a fixed amount every month. Each month the strategy holds two diversified sleeves (up to ~6 S&P 500 names) from the same machine-learning model — using only the actual S&P 500 list at each date (no hindsight), confirmed by the HuggingFace Chronos foundation model, each sleeve held ≥6 months, with a crash gate to cash. The honest question isn't "what's the CAGR" — it's "if I contribute every month, do I end up ahead of contributing the same into the S&P 500?"

On survivorship-bias-free PIT data, 2003–2026, contributing $1 every month: a 10-year DCA holder beat the identical S&P-500 DCA in 100% of all rolling windows (159 of 159) — and 99% of 5-year and 96% of 3-year windows — earning a median ~55%/yr money-weighted return (even the worst 10-year window still compounded at ~33%/yr and nearly 6×-ed the money) versus ~13%/yr for S&P-DCA. This is the honest version of a "high hit rate" — a multi-year commitment, not a monthly coin-flip.

Honest downside & the front-loading caveat: the edge is not uniform. Its magnitude is heavily concentrated in the 2003–2009 GFC-recovery era (a one-off window — $1/mo became ~47× there vs ~1.0× for S&P-DCA — that will not repeat at that scale). The strategy holds two decorrelated sleeves of the same GBM+Chronos alpha (Sleeve A swaps which scorer drives selection vs rebalance timing so their noise cancels, and Sleeve B adds a regime-conditional blend weight plus conviction-adaptive breadth; "E2", deployed May 2026). It beats S&P-DCA in all four non-overlapping eras, including the previously-weak 2010–2015 era and the recent 2021–2025 era (the slimmest margin). It still does not "beat the S&P every short period" — only the multi-year horizon is reliable historically. Short-horizon outcomes can still be brutal: the worst 1-year DCA window ended at ~0.51× of contributions, and peak-to-trough the lump-sum picker fell ~-77% (the accumulating monthly account ~-56%). There is no parabolic-upside-with-no-downside version; the data refuses it. A lower-volatility market-neutral drawdown-switch variant cuts the worst drawdown further for roughly half the upside — all shown honestly below, including a non-overlapping era-by-era table.

Loading…

If you dollar-cost-average monthly — the honest outcome

Every metric most sites show you (CAGR, Sharpe) is a lump-sum number — it describes nobody who contributes monthly. This section is the only one that describes the actual user of this product: someone who invests a fixed amount every month. For every possible start month on survivorship-bias-free point-in-time data (2003–2026, no parameter tuning), we ask: did contributing into the strategy beat contributing the same schedule into the S&P 500?

Loading…
Read this before anything else. The "100% win rate" here is a rolling 10-year DCA-vs-S&P-DCA statistic on historical PIT data (159 of 159 windows) — it is NOT a promise, NOT a monthly hit rate (that's barely a coin-flip at ~58%), and NOT downside protection. Short-horizon outcomes were genuinely brutal (the worst 1-year window ended at ~0.51× of contributions; the accumulating monthly account fell ~-56% peak-to-trough and the underlying lump-sum picker ~-77%). The strategy's edge only reliably compounds past the value of a multi-year, every-month commitment that you do not interrupt during a crash. Past performance does not predict future results. Research, not financial advice.
Current basket
Loading…
How rebalancing works: The basket is held for at least 6 months. After that, the picker re-runs every month-end. Each sleeve only swaps when none of its current picks is still in its new top-K eligible pool — i.e., the picker has discovered they're no longer best. If at least one current pick still qualifies, that sleeve keeps its basket for another month. Force rebalance every 24 months no matter what. Crash regime forces 100% cash regardless of basket age. Inverse-volatility weighted (lower-vol stocks get more weight) with a 40% cap per pick. Pre-tax numbers; ~1.0-1.5× annual turnover. Not financial advice.

How it works — 7 steps

Two AI models, run as two decorrelated sleeves (up to ~6 S&P 500 names). Each sleeve re-picks when none of its names is still in its top-K pool. Cash on crash.

  1. Universe. Use the actual S&P 500 list at the rebalance date — never today's list. ~500 stocks per month. The historical panel is the augmented PIT S&P 500 dataset (1994 tickers, including 161 acquired/renamed large-caps backfilled from FNSPID + yfinance with date-overlap validation against PIT membership). The live current universe is loaded from iShares Core S&P 500 ETF (IVV) holdings.
  2. Features. 79 price-only signals per stock (momentum, trend, recovery, RSI, volatility, drawdowns, cross-sectional rank trajectories, Mahalanobis distance to a pre-runner archetype, etc.) ranked relative to peers each month.
  3. Primary model — walk-forward GBM. Gradient Boosted Trees, retrained every January with a 7-month embargo, produce 1-, 3- and 6-month forward-return rank predictions. Two scorers are derived: ml_3plus6 (mean of the 3m & 6m rank) and a 50/50 blend of the multi-horizon-consensus rank with ml_3plus6. The strategy runs two sleeves that swap which scorer drives stock selection vs rebalance timing — see the "Why two decorrelated sleeves (E2)" note below (deployed May 2026).
  4. Confirmation model. HuggingFace Chronos-bolt-tiny zero-shot foundation model (9M parameters). Forecasts 3-month price-distribution for each stock from its 252-day daily price history. Filter: require Chronos p70 cross-sectional rank ≥ 0.45 — eliminates the bottom 45% by Chronos confidence.
  5. Crash gate. SPY drops 8% in 21 days, or 5% over 6 months with bad recent action → hold cash next month.
  6. Sleeve A picks the top 2; Sleeve B picks the top 2–3 (conviction-adaptive — widens to 3 when scores are bunched) by its Gradient Boosted score from the Chronos-filtered pool, inverse-volatility weighted with a 40% cap per pick. The two sleeves are held 50/50, so the combined book is up to ~6 names.
  7. Hold & check. Hold the basket for at least 6 months. After that, re-evaluate every month: if any current pick is still in the new top-K eligible pool, that sleeve keeps its basket; only swap when none is. Force rebalance at 24 months max. ~1.0-1.5× annual turnover.
Walk-forward validation (10 splits, OOS test)

10 TRAIN/TEST splits covering 2003-2026 — three expanding-window (A1/A2/A3), six 3y rolling (R1-R6) including the GFC, COVID, 2022 bear and 2023-24 AI rally, plus one strict last-block holdout (STRICT). The model is fit only on TRAIN data with a 7-month embargo gap.

The honest DCA reading is in the Walk-forward section above (money-weighted return vs S&P-DCA per window). The lump-sum "10/10 beat SPY" figure is engine evidence that the picker is real and not overfit — it survives the GFC, COVID, the 2022 bear and the 2023-24 AI rally out-of-sample — but a monthly contributor does not experience per-window lump-sum CAGR, so we do not headline it. Full breakdown in the IMPROVEMENTS.md research report.

Why rule-based rebalance instead of fixed 6-month?

Timing-luck mitigation. The deployed v5 used to rebalance mechanically every 6 months — twice a year. That schedule was exposed to rebalance-date luck: if the two annual entry dates happened to land on bad picking moments, the year underperformed even when the picker's average quality was fine. 2024 was the worst case: Jan-31 and Jul-31 lump-sum entries gave -25pp edge vs SPY for the year, even though the other 10 monthly entry dates averaged +5.7pp positive.

The rule-based rebalance fixes this. We hold each basket for at least 6 months (preserving the GBM's 3-6m prediction horizon), then re-evaluate every month-end. We rebalance EARLIER only when the picker has discovered that none of a sleeve's current picks is still in its top-K eligible pool. If at least one still qualifies, that sleeve keeps holding.

The rule-based schedule beats the old fixed-6-month one on rebalance-timing robustness (it materially fixed the 2024 timing-luck problem) at lower turnover (~18 early rebalances across 25 baskets vs 38 fixed). Honesty note: an earlier simplified sweep reported a flattering Sharpe 1.10 / −34.5% drawdown for this rule from a NaN-favorable sim. The canonical reference — and the DCA numbers throughout this page — is the honest one: deep interim drawdowns (~−77% lump-sum picker, ~−56% on the accumulating monthly-DCA portfolio under E2). The rule improves when we rotate; it does not remove the drawdown. Pure quarterly rotation was also tested and rejected (too frequent for the 3-6m signal horizon).

Why K = 2 per sleeve (and Sleeve B's adaptive 2→3)?

The augmented-PIT parameter sweep settled the base-K question. Within the S&P 500 cohort the cross-sectional alpha is concentrated at the top of the score distribution. K=15 dilutes alpha with mid-percentile names; K=3 (an earlier deployment) was tuned on the biased v2 panel and is sub-optimal once the universe is PIT-corrected. K=1 maximises the edge but the drawdown blows out (single-name fragility). K=2 is the per-sleeve sweet spot.

E2 runs two such sleeves. Sleeve A (WIN1) always holds the top 2. Sleeve B is conviction-adaptive: it holds 2 when the cross-sectional score gap signals a genuine standout, and widens to 3 when the top scores are bunched (the picker is effectively guessing — historically the weaker years), so one low-conviction name can't dominate. Held 50/50, the combined live book is up to ~6 names. This adaptive breadth is one of the two levers that make E2 a strict improvement on the prior single-deploy E1 (validated in IMPROVE_PICK_RCD_FINDINGS.md).

K=2 is chosen for the strength of the DCA edge, not for low drawdown — drawdowns are deep at every K. The honest E2 figure is ~−77% on the underlying lump-sum picker / ~−56% on the accumulating monthly-DCA portfolio (disclosed in the DCA section above), and that deep interim drawdown is the price of the upside. K=2 also dominates K=3 across every level of the Monte-Carlo synthetic-delisting overlay (α 0–20%/yr): fewer picks means fewer monthly delist exposures while the picker edge per surviving pick dominates the loss per delist event. See IMPROVEMENTS.md.

Survivorship-bias overlay (Monte Carlo)

Even on the augmented PIT panel (1994 tickers, 72% coverage in 2003 → 99.7% in 2025), there are still ~213 historical OTC bankruptcy-Q tickers (AAMRQ, LEHMQ, WAMUQ, ANRZQ, etc.) that free data cannot reach. We run a Monte Carlo overlay where each pick has an independent per-month probability of being synthetically wiped to -100% at hazard rate α.

Across hazard rates the picker keeps beating the S&P up to a high synthetic-delisting rate (well beyond the realistic ~2-4%/yr large-cap rate), and K=2 stays more robust than K=3 at every level tested — fewer picks means fewer monthly delist exposures. The per-α robustness verdicts are in the "Survivorship stress" section below; this is engine robustness, not the DCA return.

Augmented PIT S&P 500 dataset

The historical universe was reconstructed to fill the gap that the original v2 panel left: it was missing 374 of the 985 historical PIT S&P 500 tickers (only 51% coverage in 2003), mostly acquired or bankrupt names whose tickers Yahoo retired. The augmented panel adds 161 backfilled tickers — 108 from the FNSPID Hugging Face dataset (CC BY-NC) and 53 from yfinance — including AGN (Allergan), ANTM (Anthem), ABMD (Abiomed), ALXN (Alexion), ATVI (Activision), BHGE→BKR (Baker Hughes), RTN→RTX (Raytheon), SYMC→GEN (Symantec), and others. Every backfill is validated by date-overlap against PIT membership to filter ticker-reuse traps.

Coverage lifts to 72% in 2003 → 99.7% in 2025. Full methodology and per-year coverage in the dataset README and the PIT validation report.

2024 timing-luck and the staggered alternative

The 6-month rebalance schedule means 2 entries per year — exposed to "rebalance-date luck". 2024 was the worst case at K=3: Jan-31 and Jul-31 entries landed on the year's two worst picking moments (-25 pp edge vs SPY), even though the other 10 monthly entry dates averaged +5.7 pp positive. Diagnosed in TIMING_LUCK.md.

Moving to K=2 (and now the E2 two-sleeve design) reduces 2024's edge gap from -25 pp to -10.2 pp just by being more selective. A more aggressive alternative — crash-aware 6-tranche staggered DCA — was studied and offers further timing-luck mitigation, but at the cost of ~3 pp WF mean. Not deployed; documented for the curious.

The HuggingFace Chronos foundation model

The v5 strategy adds a confirmation signal from Amazon Chronos-bolt-tiny — a 9M-parameter zero-shot time-series foundation model from HuggingFace, trained on millions of generic time series. Given a stock's trailing 252-day daily prices, it produces a probabilistic 64-day forecast distribution. We take the 70th-percentile forecast (Chronos's confidence that the stock will be in the top end of its forward distribution) and cross-sectionally rank it within the S&P 500 cohort each month.

The strategy only considers stocks where the Chronos p70 rank is ≥ 0.45 (i.e., the upper 55% by Chronos confidence). From that filtered pool, Sleeve A picks its top 2 and Sleeve B its top 2–3 (conviction-adaptive). The two AI models capture different aspects of the cross-section — Gradient Boosted Trees on tabular features, Chronos on raw price-shape dynamics — and their combination is more robust than either alone.

Why two decorrelated sleeves (E2, deployed May 2026)?

Stock selection and rebalance timing are two separate decisions, and the GBM gives two useful scorers: ml_3plus6 (mean of the 3m & 6m forward-rank) and a 50/50 blend of the multi-horizon-consensus rank with ml_3plus6. Sleeve A (WIN1) selects with ml_3plus6 and uses the blend as its drift trigger. Sleeve B (RC D + adaptive breadth) selects with the blend (triggering on ml_3plus6) but with two extra, independently-validated picking levers: the blend weight is regime-conditional — momentum-leaning in a confirmed bull, consensus-stable in normal/recovery — and the basket breadth is conviction-adaptive, holding 2 names when the cross-sectional score gap signals genuine conviction and widening to 3 when scores are bunched. It is the same underlying alpha; the two sleeves rebalance on different dates and hold different names, so their idiosyncratic noise partly cancels (E1's free-consistency lever) while Sleeve B's regime-timed blend adds forward return and adaptive breadth adds year-to-year consistency — orthogonal levers that stack. We hold them 50/50 (the combined book is up to ~6 names). This is not a new alpha source — it is diversification plus regime/conviction-aware sizing of one edge.

E2 cleared the full production-harness gauntlet (Chronos + inv-vol + regime gate + rule-based rebalance + canonical walk-forward + Monte-Carlo delisting overlay) and is a strict Pareto improvement on the prior single-deploy E1: full CAGR 51.9% → 56.6%, Sharpe 1.03 → 1.10, max interim drawdown −56% → −56% (unchanged — the 2008 GFC floor), walk-forward splits beating SPY 10/10, the worst rolling 5-year DCA outcome +11.7%/yr → +13.6%/yr, the forward CAGR excluding the front-loaded 2003–09 era 33.4% → 38.2%, and it beats S&P-DCA in all four non-overlapping eras (slimmest: recent 2021–2025).

Honest read: every headline metric improves and the rolling DCA win rate is a literal 100% at 10y (159/159), 99% at 5y, 96% at 3y. It is robust where it matters against overfitting: cost-insensitive (identical 0–30 bps), a wide 50/50 mix-weight plateau (0.3–0.7), more delisting-robust than E1, and the strongest truly-out-of-sample holdout of any variant (untouched 2013–2026: Sharpe ~1.24 vs E1's 1.08). The −56% drawdown does not move — it is the 2008 GFC systemic event, which a stock-picking lever cannot reshape; E2 raises return and consistency at E1's drawdown, it does not make the strategy low-risk. The edge magnitude is still hugely front-loaded in 2003–2009 and the interim drawdowns are still deep (~-56% on the accumulating account, ~-77% on the lump-sum picker) — E2 narrows the dispersion, it does not remove the risk. Full detail and the overfit gauntlet in IMPROVE_PICK_RCE1_FINDINGS.md.

Other strategies & full research

Full v5 research, the augmented PIT dataset, the parameter sweeps (K, scorer, Chronos quantile, hold, cap, regime gate), the Monte-Carlo delisting overlay, and the reproduction guide all live in the experiments/monthly_dca/v5/spx_pit/ directory of the GitHub repo. The v2/v3/v4 baselines remain in their respective directories for comparison.

Your DCA account — $1 contributed every month since 2003

This is the monthly-DCA experience, not lump-sum: contribute $1 every month-end into the strategy and watch the account accumulate, versus contributing the identical schedule into the S&P 500. The grey line is the money you put in; green is the strategy account; the S&P-DCA line is the same contributions into SPY. Honest walk-forward predictions throughout — the model only sees data older than (test month − 7 months) and Chronos only sees prices up to the rebalance date.

Loading…
Strategy DCA account S&P-DCA (same contributions) Money contributed

If you'd been DCA-ing for X years

If you'd contributed monthly starting X years ago and kept going to today: the annualized money-weighted return your contributions earned, and the same for identical contributions into the S&P 500.

Year-by-year (DCA through each year)

For each calendar year: if you contributed $1 every month during that year, the % return on the money you put in by year-end, versus the identical monthly contributions into the S&P 500. This is the DCA experience within each year — not lump-sum calendar compounding.

Walk-forward — 10 out-of-sample windows, in DCA terms

10 strictly out-of-sample train/test windows (the model only sees data older than test − 7 months). Shown the honest way: if you'd contributed monthly through each window, did you end ahead of the same contributions into the S&P 500? The often-quoted "10/10 beat SPY" is a lump-sum per-window statistic no monthly contributor experiences — the DCA reframe below is what you'd actually have lived, and it is deliberately less flattering over short windows.

The picks that drove your DCA account

Every sleeve-basket the strategy has held since 2003, newest first (the strategy runs two such sleeves in parallel; your monthly contributions are split 50/50 across them). A sleeve-basket holds 2–3 stocks (Sleeve B widens to 3 when conviction is low). Each card shows that sleeve-basket's stocks and its hold-period return (a picker mechanic, not your DCA outcome — the investor result is the DCA section at the top). Each sleeve holds its basket at least 6 months and rotates when none of its names is still in its top-K eligible pool.

Historical monthly baskets

Browse every monthly basket the strategy has picked since inception. Use the year selector to jump to any past month and see what was bought, the entry/exit dates, and what happened.

Loading…

Every trade since inception

Flat list of every pick — entry date, ticker, entry price, exit date (when none of a sleeve's picks remained in its top-K pool or 24-month max-hold reached), exit price, status (held vs exited), return, and the basket-id grouping. Sorted newest first.

Robustness — the DCA edge and the engine behind it

Sub-period robustness is shown in DCA terms (money-weighted return vs S&P-DCA). The universe- and parameter-generalisation checks below validate the picker that generates your DCA account — they are engine robustness, not a DCA return promise. Your DCA outcome is the section at the top of the page.

Survivorship stress — does the edge survive phantom failures?

The universe is already true point-in-time S&P 500 (delisted names are eligible while listed, then removed). This goes further: a Monte-Carlo overlay wipes random picks to −100% at rising hazard rates to test whether the picker's edge over the S&P survives if the price panel quietly missed any failures. Shown as a robustness verdict (does it still beat the S&P?), not a return number — the DCA outcome is the section at the top.

Edge survives delisting up to
≤ —%/yr
Largest synthetic annual delisting hazard at which the picker still beats the S&P. The realistic large-cap rate is ~2-4%/yr.
At α=4%/yr (realistic)
Historical large/mid-cap delisting rate. Robustness verdict for the picker.
At α=8%/yr (2× historical)
Double the historical rate — a deliberately harsh stress.
At α=20%/yr (apocalypse)
Every pick has a 20%/yr phantom-wipe probability — far beyond any real regime. The breakage stress.
Robustness verdict at eight annual delisting rates

Pretend each pick has an annual probability α of being silently wiped to −100% (a synthetic delisting). Does the picker still beat the S&P? This is engine robustness — the investor outcome is the DCA section at the top.

Stress levelRobustness verdict
Why true PIT membership matters

The naive backtest uses today's S&P 500 list back-applied to all dates — but today's list is curated by S&P to remove failures. Companies like Lehman Brothers (S&P 500 1986-2008), Bear Stearns (1998-2008), AIG (1980-2008), GM (1925-2009), Washington Mutual (2002-2008), SVB (2009-2023), Bed Bath & Beyond (2003-2022), Sears (1956-2018) were all S&P 500 constituents during the periods they were collapsing — but they're absent from today's list. Restricting picks to today's list eliminates them automatically, gifting the backtest an enormous artificial edge.

The eligible universe is built from 2,595 daily snapshots of the S&P 500 from 1996 to 2019, plus 110 explicit add/remove change events from 2019 onward. 976 unique tickers were S&P 500 constituents at some point; ~500 per month. The eligible pool at, say, September 2008 is the actual September 2008 S&P 500 — Lehman Brothers is eligible to be picked, then auto-removed in October 2008 when it actually delisted. No hindsight.

Two different "win rates" — which 100% is real and which is impossible

The impossible one: a ~100% monthly hit rate, or a 100% lump-sum win rate, with parabolic upside and no downside. That does not exist on this data and we will not claim it. The cross-sectional information coefficient (IC) within the S&P 500 cohort is ~0.04 for the GBM plus ~0.02 from the Chronos filter — small but real, and at the upper end of the ~0.06 literature ceiling for a price-only large-cap signal. Any strategy with finite IC has a ceiling; the monthly hit rate here is ~58% (barely above a coin-flip) and short-horizon drawdowns reach ~-77% on the lump-sum picker. We separately built and honestly killed multiple attempts to beat this (a Chronos-distribution downside model, conviction-adaptive concentration, vol-targeting and drawdown-breaker overlays) — they failed or bled too much return, confirming there is essentially one independent alpha here.

The real one: a rolling-10-year DCA-vs-S&P-DCA win rate of 100%. These are completely different statistics. The first asks "is any single month/lump-sum a winner?" (no, not reliably). The second asks "if I contribute every month for 10 years, do I end up ahead of the same contributions into the S&P 500?" — and on PIT data 2003–2026 the answer was yes in 159 of 159 rolling 10-year windows (a literal 100% under the deployed E2 two-sleeve strategy; the worst such 10-year window still grew the money nearly 6×, and 99% of 5-year windows also won). A low monthly hit rate and a high multi-year DCA win rate are mathematically consistent: the big winners dominate the terminal portfolio while steady monthly contributions average through the drawdowns. The honest claim is the second one, with its multi-year-commitment, front-loading and interim-drawdown caveats stated plainly — never the first.

That the underlying picker stays positive out-of-sample across the GFC, COVID, the 2022 bear AND the 2023-24 AI rally is the strongest non-overfit evidence the engine is real — it is what makes the long-horizon DCA edge trustworthy rather than a backtest artifact.