We Backtested 1,643 of Our Own Pattern Calls Against SPY. Here is What Held Up.

Chart patterns are the easiest thing in finance to make a content business out of, and the hardest thing to defend with numbers. Every retail-investing site you have ever opened explains head-and-shoulders, double bottoms, flags, and divergences as though they obviously work. Almost none of those sites ever shows you the win rate.

We run a pattern scanner for a living, so it felt dishonest to keep dodging the question. In May 2026 we took every single chart-pattern call our scanner had published since 2025-01-01 — 1,643 patterns across 12 detector types, on the S&P 500 / Nasdaq 100 / major US ETF universe — and walked each one forward in time against the SPY benchmark.

The results were uncomfortable enough that we decided to put them on the public blog instead of keeping them in an internal doc. This post is that doc, cleaned up.

TL;DR. After subtracting SPY's market return, most of our bullish trend signals lose money relative to the index — a textbook head-and-shoulders bottom actually predicts underperformance. Two specific signal families survive: bearish trend calls (trend −1 at the 60-day horizon, 63.6% alpha-positive) and divergence signals (MACD bullish divergence at 20 days, 71.1% alpha-positive). The simple, naive bullish trend follower is the worst surviving trade in the table.


Why we test against SPY instead of the raw return

In a year like 2025, almost every chart pattern looks like a winner if you measure raw returns. The S&P was up roughly 22% over the period we studied. Any random bullish call you wrote down would have closed in the green more than half the time, not because the call was right, but because the tide was rising.

The honest question is whether the pattern added information beyond owning SPY. For every pattern call, we compute three numbers:

  • Raw return: how the stock moved over the forward window.
  • Baseline return: how SPY moved over the same calendar window.
  • Alpha: the raw return minus the baseline, sign-flipped for bearish patterns so that "the prediction was right" always shows as a positive number.

Concretely, for a pattern call on date t with a forward horizon of w trading days:

αw = ( rwbw ) × bear_bull

where rw is the stock's return from t to t+w, bw is SPY's return over the same calendar dates (calendar-aligned, not trading-day-aligned, to handle ticker halts cleanly), and bear_bull is +1 for bullish patterns and −1 for bearish patterns. A win is αw > 0.

Under that definition, a 50% win rate means the signal is indistinguishable from a coin flip. Anything materially above 50% is alpha; anything materially below 50% means the signal is anti-predictive.

This is a much harsher bar than the "did the stock go up?" question that most pattern guides answer.


How we ran it

  • Universe: S&P 500 + Nasdaq 100 + major US ETFs, roughly 500–800 tickers in scope at any moment.
  • Time window: every pattern whose end_date falls between 2025-01-01 and the present.
  • Forward horizons: 5, 20, and 60 trading days after end_date.
  • Baseline: SPY, calendar-aligned to each pattern's end_date.
  • Aggregation: results grouped by (pattern_type, bear_bull). We do report a finer sub_type slice internally, but the headline numbers are clearer at the type level.
  • Sample: 1,695 patterns loaded; 1,643 (96.9%) had enough forward data on at least one window. The 52 skipped patterns were too new — their end_date fell inside the most recent 5 trading days for w5, 20 days for w20, or 60 days for w60.

A practical note on the 60-day column: patterns whose end_date is within the last 60 trading days are silently absent from that column. Where the resulting sample drops into single digits, the cell is unreliable. We flag those below.


The full table

This is every aggregate from the run, sorted by detector type. Win values are alpha-positive percentages — the percentage of pattern calls in the bucket whose forward return beat SPY in the predicted direction.

TypeDirectionn5d win20d win60d win
trend▼ Bearish29856.0%57.2%63.6%
trend▲ Bullish48740.7%41.3%23.5%
resist_support▼ Bearish2373.9%33.3%50.0%
resist_support▲ Bullish2240.9%53.8%0.0% ⚠︎
flag▼ Bearish3748.6%33.3%50.0%
flag▲ Bullish5149.0%53.3%100.0% ⚠︎
double_ex▼ Bearish2470.8%63.6%33.3%
head_shoulder▼ Bearish4156.1%51.9%50.0%
head_shoulder▲ Bullish4632.6%27.3%0.0% ⚠︎
divergence_rsi▲ Bullish5752.6%64.5%0.0% ⚠︎
divergence_macd▼ Bearish1643.8%22.2%50.0%
divergence_macd▲ Bullish6850.0%71.1%0.0% ⚠︎
avg_cross_60▼ Bearish6564.6%51.4%25.0%
avg_cross_60▲ Bullish10144.6%41.4%0.0% ⚠︎
avg_cross_120▼ Bearish2352.2%46.7%75.0%
avg_cross_120▲ Bullish2552.0%53.8%0.0% ⚠︎
avg_openUp▼ Bearish4652.2%60.0%75.0%
avg_openUp▲ Bullish6939.1%44.0%0.0% ⚠︎
super_trend_adx▼ Bearish7360.3%53.7%75.0%
super_trend_adx▲ Bullish7048.6%58.1%0.0% ⚠︎
Green ≥ 60%Red < 45%⚠︎ single-digit sample — unreliable

The interesting parts of this table are not the random-looking middle. They are the three patterns that appear consistently.


Finding 1: hunting weak stocks works better than hunting strong stocks

Take the four detector families with enough samples on both sides — trend, avg_openUp, super_trend_adx, flag — and line up the 20-day alpha for the bullish (+1) and bearish (−1) call:

Patternn5d win20d win60d win
trend ▼ Bearish29857.2%
trend ▲ Bullish48741.3%
avg_openUp ▼ Bearish4660.0%
avg_openUp ▲ Bullish6944.0%
super_trend_adx ▼ Bearish7353.7%
super_trend_adx ▲ Bullish7058.1%
flag ▼ Bearish3733.3%
flag ▲ Bullish5153.3%
Green ≥ 60%Red < 45%⚠︎ single-digit sample — unreliable

Three of the four families produce more alpha when used to find weak stocks than when used to find strong ones. This is unintuitive, because every retail content site frames pattern analysis as a tool to "find the next big winner." Our data, on this universe, says the more reliable use is "find names that are going to lag the index."

There are three plausible mechanisms for this:

  1. The short side is less crowded. Both retail and institutional money skews long. A bearish chart signal has fewer participants leaning on it, so it gets arbitraged less. Whatever alpha is in the signal survives longer.
  2. Falling is asymmetric. Stocks fall when buying disappears, not when selling appears. Technical indicators that look for exhaustion are structurally better at calling that disappearance than they are at calling fresh demand.
  3. In a single-direction bull tape, the laggards are real. If SPY is grinding upward and a stock fails to participate, something is genuinely wrong with the stock — sector rotation, fundamentals, liquidity. The pattern is detecting that wrongness rather than predicting it.

The honest interpretation is some mixture of (1), (2), and (3), and the proportions will look different in a sideways or bear tape. Which is the next caveat.


Finding 2: bullish trend signals are anti-predictive over 60 days

Here is every bullish trend / momentum signal, 60-day window only:

Signal (Bullish)n5d win20d win60d win
trend48723.5%
avg_cross_601010.0% ⚠︎
avg_cross_120250.0% ⚠︎
avg_openUp690.0% ⚠︎
super_trend_adx700.0% ⚠︎
Green ≥ 60%Red < 45%⚠︎ single-digit sample — unreliable

The 0% values are mostly the sample-size warning — those buckets have only a handful of patterns old enough to qualify, and they all happened to disappoint. Take those with appropriate skepticism. But the trend +1 number is the headline result of this entire study: with n = 487, the most-sampled bucket in our dataset, the bullish-trend signal is anti-predictive at the 60-day horizon, with only 23.5% of calls beating SPY. The remaining 76.5% lagged.

The signal is not nothing. The same trend +1 bucket shows 40.7% alpha-positive at 5 days and 41.3% at 20 days — short-term momentum is real, the prediction is just being unwound. From a quant-finance perspective this is exactly the mean-reversion signature you would expect: short-term continuation, long-term reversal, with the reversal large enough that it dominates the cumulative result. From a chart-pattern practitioner's perspective it is the inverse of what the textbooks promise.

The cleanest read on trend +1: it is a real short-horizon signal that should not be held for more than four to six weeks.


Finding 3: divergence patterns are the only bullish family with persistent alpha

Of every bullish (+1) family we tracked, only two have a 20-day alpha win rate materially above 50%:

  • divergence_macd (+1): 71.1% at w20, n = 68. Statistically strong.
  • divergence_rsi (+1): 64.5% at w20, n = 57.

This is the interesting case in the whole study. Divergence is not a momentum signal — it does not say "this stock is going up." It says "this stock is going down, but the momentum is fading." It catches a turning point rather than continuation. In the framing of Finding 1, it is closer to "find an oversold loser whose selling pressure is exhausting" than to "find a strong stock." Mechanically, it is a contrarian signal dressed in technical-analysis clothing.

The 60-day column for both divergence families is sample-size-warned and noisy, so we make no claim about persistence beyond a month. But across a 20-day window, in our universe, with n in the high tens, divergence is the most reliable bullish signal we publish.


A specific case: head_shoulder (+1) is reversed

The textbook bullish head-and-shoulders pattern (an inverted head-and-shoulders bottom, the "ground floor before a rally") performs as follows:

Signaln5d win20d win60d win
head_shoulder (+1)4632.6%27.3%
Green ≥ 60%Red < 45%⚠︎ single-digit sample — unreliable

The signal is not weak. It is reversed. Buying after a head-and-shoulders bottom in 2025–2026, on this universe, has been a sub-coin-flip trade against SPY in every horizon we measured. The behavioural explanation is straightforward: enough traders learned the textbook that "long after an HS bottom" is now a crowded trade, the entry pushes prices in the short term, and the move reverses on a horizon long enough for the textbook holders to give up.

This is precisely the kind of result you cannot get from reading the textbook. You can only get it from counting.

We are not removing head_shoulder (+1) from the scanner. We are using the historical win-rate field to signal to subscribers that this particular cell is currently anti-predictive.


Where the historical numbers show up in the product

If you are a paid subscriber, you will already have seen this: every pattern card and detail page in Pattern Vista now carries a small badge showing the 20-day historical alpha win rate for that exact (pattern_type, direction, sub_type) bucket. Cells above 60% get an emerald tag, cells between 45% and 60% get a neutral grey, and cells below 45% get a red tag. A Trend +1 card today shows in red. A MACD divergence +1 card today shows in emerald.

This is a small UI element with a single design goal: to keep you from over-reading any individual signal. The detector is what it is. The badge tells you, in one glance, how the same detector has played out historically. It is updated every night from the same data described in this post.

If you are on the free tier, the badge shows as a locked icon — you can see that the data exists, you can read this article to understand what it means, and you can upgrade to read individual cells.


What this study does not say

This was a focused experiment with several deliberate restrictions, and the conclusions only travel as far as those restrictions allow.

  1. The universe is the 2025–2026 US bull tape. A single regime. The "bearish patterns produce alpha" finding may well be a property of bull markets specifically — when the tide is rising, the boats that fail to lift have a reason, and the reason can be detected. In a sideways or bear regime, that asymmetry may disappear, reverse, or get drowned in volatility. We are not yet able to test that.

  2. Bullish-momentum families being anti-predictive may itself be regime-conditional. The same trend +1 signal that lost to SPY in 2025–2026 might have been the trade of the decade in 2017 or 2024. Cross-regime backtests are next on our roadmap.

  3. No industry or market-cap slicing. A super_trend_adx +1 call on a large-cap healthcare name and on a small-cap miner should not be expected to have the same alpha. We aggregated; we have not yet sliced.

  4. Survivorship bias. Our universe is current index constituents. Tickers that were delisted between 2025 and now are absent. This biases all the numbers upward, especially on the bullish side, where the most catastrophic outcomes (going to zero) are precisely what gets removed from the universe.

  5. No transaction costs, no slippage. Real-world execution will eat into every alpha number above. The thinly-traded names will eat more.

  6. Each pattern is treated as independent. A single name producing five patterns in a week becomes five entries in the sample, even though those entries are highly correlated. The effective sample is smaller than n.

  7. The patterns are our scanner's calls, not the academic definitions. Our parameter choices on each detector have material effect — a stricter threshold would have produced fewer, higher-quality calls; a looser one, the opposite. The alpha numbers belong to our scanner, not to "head-and-shoulders" as a Platonic concept.

If any of these caveats are deal-breakers for the decision you are about to make on the back of one of our signals, take the badge as advisory, not prescriptive.


What we are doing next

Three specific follow-ups are queued:

  • Cross-regime backtest. Pull pattern history back to 2018 or earlier, separate bull / bear / sideways regimes, re-run. This is the single most important question we cannot currently answer.
  • Signal-stacking experiments. Take the high-alpha buckets (divergence MACD +1, trend −1, double-bottom −1) and test whether adding a second filter (a 250-day moving-average distance check, a multi-timeframe agreement gate) pushes the alpha higher. The roadmap thread for this is internally tagged "deviation ranking" and "trend-strength scoring."
  • Industry / market-cap slices. Re-bucket the existing 1,643-sample dataset by sector and by size decile, look for whether the alpha is concentrated in specific corners.

Each of those is a separate post when it lands. For now, the practical takeaway from this one is the badge — every card in Pattern Vista now tells you how the underlying signal has done historically. Use it as one input among several, and read the cells, not the headlines.


Reproducibility

The code that produced the table above is in our backtest/winrate.py and backtest/writeback.py modules. The pattern history and price caches that fed it are append-only and live alongside the production scanner. The Supabase column that powers the on-site badge — patterns.hist_win_w20, plus a richer hist_stats JSON blob for w5 / w20 / w60 per bucket — is refreshed by the same scripts every night.

If you would like the raw numbers behind a specific row in the table above, or you would like us to re-run a slice you care about (a specific sector, a specific window, raw return instead of alpha), reach out — we are interested in pressure on this work.