The structural case for narrative risk — and how we measure it.
285,245 observations · 2009-01 through 2025-02 formation window · 12-month forward return horizon · $5 minimum price filter · bootstrap 95% CI (1,000 samples) · 7-year trailing anchor window
This document is designed to answer one question: whether the claim that structural narrative risk is measurable — and that measuring it produces a consistent, defensible edge — is actually supported by the data. What follows attempts to answer that honestly, including the parts of the evidence that are less convenient than the headline numbers suggest.
Across 285,245 observations spanning 17 years and three distinct market regimes, companies that generate free cash flow produced median annual returns of +7.7%. Companies whose valuations rest on revenue alone — companies that have not yet earned their way to profit, let alone cash generation — produced median annual returns of −29.7%.
The spread is 37.3 percentage points. It held during the post-crisis recovery. It held during the extended bull market. It held during COVID disruption and post-COVID rate normalization. What a company has actually built into its financials predicts its subsequent returns in a way that what it has merely promised does not — and this has been true across every market condition in the dataset.
The framework exists to measure that distinction systematically, at the company level, across the full universe. But this finding is independent of the framework. It is what the data shows before any model is applied.
Median 12-month return by anchor rung · full period 2009–2026
FCF vs Revenue median spread: +37.3pp · Held across all regimes tested · n=290,902
Empirical confirmation
The OAL spread shows that anchor depth predicts returns. The framework's job is to measure anchor depth — and its distance from current valuation — for every company in the universe, simultaneously, on a comparable basis.
Every company in the scored universe is assigned to the deepest financial rung it has genuinely sustained over a trailing seven-year window. Free cash flow is the deepest. Net income is next. Then operating income. Then revenue. A company must have produced positive cumulative figures over the full seven-year period to qualify for a given rung — a single strong quarter is not sufficient. The rung assignment answers a precise question: what has this company actually demonstrated, not over its best recent period, but consistently, across a full business cycle window?
Anchor rung alone is not enough. Two companies can occupy the same rung — say, both FCF-positive — while their valuations sit at entirely different distances from that anchor. A company generating modest free cash flow at a 60× EV multiple is structurally different from a company generating substantial free cash flow at a 12× multiple, even though both qualify for the same rung. The first axis of the framework measures that distance: how far has the valuation stretched beyond what the anchor actually supports, adjusted for how shallow that anchor is?
The second axis measures something the first cannot see. A company's current valuation-to-anchor ratio is a snapshot. It tells you where the company is. It does not tell you whether the anchor itself is strengthening or eroding. A company whose free cash flow has been deteriorating steadily for six quarters carries a different structural risk profile than a company at the same current multiple whose free cash flow has been improving. The second axis captures that trajectory across the anchor's history, with the most recent comparisons weighted more heavily than older ones.
Neither axis alone produces the discrimination the composite achieves. A valuation-stretch signal without trajectory context misses companies that are expensive but improving. A trajectory signal without valuation context flags deteriorating companies regardless of how conservatively they are priced. The composite score — the equal-weight mean of both axes, expressed as a percentile rank across the full universe — is where both conditions are required simultaneously. A company scores into the highest structural risk bucket only when its valuation has stretched far beyond its anchor and that anchor is deteriorating. Both. The signal concentrates precisely because the threshold is precise.
Axis 1
Anchor Detachment Risk
r = −0.0753
Near-zero pre-2020. Dominant in COVID (r=−0.261) and post-COVID (r=−0.097).
The 7-year cumulative anchor is deliberate. A company must have demonstrated positive output over a full seven-year trailing window, not just in the most recent quarter. The shallowness penalty then encodes the structural distance between OAL rungs: a revenue-anchored company is treated as structurally more expensive relative to its anchor than an FCF-anchored company at the same raw multiple.
Axis 2
Anchor Degradation Risk
r = −0.0743
Present in all regimes including pre-2020 (r=−0.037). Dominant post-COVID (r=−0.095).
Axis 2 captures trajectory, not position. A company moving toward deeper operational grounding is accumulating structural strength, whether or not the market has noticed. A company moving away from it is accumulating fragility, whether or not the price has moved. The score is derived from year-over-year comparisons across up to 28 quarters of anchor history, adjusted by an OAL shallowness penalty and ranked globally. The four most recent year-over-year comparisons receive double weight.
Axis 3 — Anchor Coverage Risk is a contextual disclosure layer. It is evaluated separately for firms with non-zero interest obligations and is not included in the composite score. The decision reflects an empirical finding: including Axis 3 at any weighting does not materially improve the composite's full-period signal. Firms without interest expense are not assigned an Axis 3 score. Their composite is the normalized mean of Axis 1 and Axis 2 only. Approximately 16% of the current universe falls outside Axis 3's domain.
How to read Spearman r in an equity factor context
Cross-sectional Spearman r measures rank-order consistency between a risk score and subsequent returns across all companies simultaneously. The scale differs materially from behavioural or clinical research. The Fama-French value factor — one of the most replicated factors in academic finance — produces Spearman r in the 0.03–0.06 range in cross-sectional studies. Momentum produces 0.05–0.09.
| Factor | Spearman r | 95% CI | Label | N |
|---|---|---|---|---|
| Axis 1 — Anchor Detachment Risk | −0.0753 | [−0.0791, −0.0715] | Strong | 291,145 |
| Axis 2 — Anchor Degradation Risk | −0.0743 | [−0.0780, −0.0704] | Strong | 285,245 |
| Composite (equal weight) | −0.0907 | [−0.0942, −0.0868] | Strong | 285,245 |
Bootstrap 95% confidence intervals (1,000 samples). All p-values = 0.0000. ICIR: Composite −0.7343, Axis 2 −0.6364, Axis 1 −0.5406. Inter-axis Pearson correlation: 0.0088 (shared variance <0.01%).
The question serious allocators ask about any new framework is whether it carries information beyond what established factors already explain. If the signal is merely a repackaging of value, profitability, or momentum, it adds nothing that cheaper instruments don't already provide.
The framework has been tested against the Fama-French five-factor model plus momentum — the most comprehensive standard factor benchmark in academic equity research — controlling for market beta, size, value, profitability, investment, and momentum simultaneously. The long-short portfolio was regressed against all six factors with Newey-West standard errors to account for return autocorrelation.
After stripping all six factors, the portfolio produces +20% annualized alpha. The t-statistic is 3.72. The threshold for statistical significance in this context is 2.0. The factor model explains 3.48% of long-short return variance. The remaining 96.52% is orthogonal to the entire established factor set — meaning the signal is not explained by size, value, profitability, momentum, or any combination of them. This is not a repackaging of known signals.
The RMW loading — the profitability factor — is directionally positive, as expected. A framework that measures operational grounding should correlate with profitability. But the loading does not reach statistical significance, which means the signal is not simply a restatement of the profitability premium. The framework is detecting something the profitability factor, at its level of resolution, is not capturing.
The alpha holds across all three market regimes — including the extended pre-2020 bull market. Pre-2020 factor-adjusted long-short alpha: +16.5% annualized, t-statistic 3.55 — statistically significant across 132 months of a market that spent the peak of its narrative cycle rewarding exactly the companies the framework flags as structurally fragile. A signal that only survives market disruption is a crisis hedge, not a structural framework. The pre-2020 result is the more demanding test, and it holds.
The composite signal varies materially across market regimes. This is expected and disclosable — the framework measures structural risk, and structural risk does not resolve on the same schedule in every market environment.
What the regime data shows is not a framework that works only during disruption. It shows a framework whose cross-sectional signal strengthens when narrative valuations are correcting and attenuates when the market is most aggressively rewarding the companies it flags as fragile. Those are different statements. The first would describe a crisis indicator. The second describes a structural signal behaving exactly as theory predicts.
During COVID and post-COVID rate normalization — the periods when stretched valuations faced the harshest structural test — the composite signal is Substantive. During the extended pre-2020 bull market, it is Slight. The full-period Spearman r of −0.091 reflects 17 years of both environments averaged together.
Composite Spearman r by market regime · 285,245 observations · 2009–2026
The regime chart above shows rank-order consistency across the full return distribution. What it does not show — and what the factor-adjusted results capture — is that the structural premium was present and statistically significant even when the cross-sectional signal was most attenuated.
Pre-2020 Signal — Honest Disclosure
The pre-2020 composite signal (r = −0.025) is lower than in subsequent regimes but is neither absent nor negligible in the factor-adjusted sense. The distinction matters. Spearman r measures rank-order consistency across the full return distribution simultaneously. When narrative premium expansion broadly lifts companies across Q1 through Q4 — as it did during 2017–2019 — cross-sectional rank correlation attenuates even when the highest-risk tail is diverging sharply. The factor-adjusted long-short portfolio strips this market-wide effect and isolates the structural signal. Pre-2020, that signal produced +16.5% annualized alpha with a t-statistic of 3.55.
The weakest sub-period within the pre-2020 window is 2017–2019, when narrative premium expansion was at its peak and structurally fragile companies were broadly rewarded by the market. The 2019 peak inversion (r = +0.088) is the strongest anti-signal year in the dataset. The 2021 signal (r = −0.437, Substantive) reflects the subsequent collapse of those same narratives.
The pattern is structurally coherent. The framework identified correctly which companies were fragile throughout. The market spent 2017–2019 rewarding exactly that fragility at the cross-sectional level. The factor-adjusted alpha confirms the structural condition was present and real the entire time. When the narrative cycle broke, both signals converged.
Year by year, the pattern is legible. Signal strength tracks the relationship between narrative cycles and structural reality. The 2019 inversion — the peak of narrative premium expansion — is the framework's weakest year in the dataset. The 2021 result — the sharpest single-year signal — reflects the collapse of the same narratives that drove 2019's inversion. These are not coincidences. They are the framework measuring what it claims to measure.
Composite Spearman r · year by year · 2009–2025
Red bars = signal inversion (higher risk outperformed). 2019 peak inversion (r=+0.088) coincides with the narrative premium cycle peak. 2021 extraordinary signal (r=−0.437) reflects the subsequent collapse.
Median 12-month return by composite quintile
Signal concentrates in Q5 penalty: median geo return −10.9%, hit rate 51.2%.
Quintile table — full period
| Q | Median | Geo | Hit | N |
|---|---|---|---|---|
| Q1 — Lowest Risk | +10.8% | +9.0% | 64.0% | 57,049 |
| Q2 | +9.5% | +6.7% | 62.5% | 57,049 |
| Q3 | +9.2% | +5.5% | 61.4% | 57,049 |
| Q4 | +9.5% | +5.3% | 62.4% | 57,049 |
| Q5 — Highest Risk | +1.3% | −10.9% | 51.2% | 57,049 |
Q1–Q5 median spread: +9.5 pp · t=17.56 · p=0.0000 · Full period 2009–2026.
The framework identifies structural conditions, not outcomes.
A company in the Very High bucket is in a structural state where severe losses are materially more likely than the universe base rate. It is not predicted to lose. The false positive rate — the proportion of Very High classifications that do not produce severe losses in the subsequent twelve months — is substantial. Most Very High entries do not produce catastrophic outcomes in any given twelve-month window. The framework identifies the structural condition. What an investor does with that identification is theirs to determine.
It does not tell you when.
The most common misuse of the framework is treating a Very High classification as a timing signal. It is not. A company can remain structurally fragile for extended periods if the narrative sustaining its valuation holds. The pre-2020 inversion is the empirical proof: the framework correctly identified structurally fragile companies throughout 2017–2019. The market rewarded them anyway. The structural condition was real. The timing was not derivable from the score.
The signal attenuates during narrative expansion.
The cross-sectional Spearman r is lowest when narrative premium is highest — when the market is most aggressively rewarding exactly the companies the framework flags as fragile. The factor-adjusted alpha holds across regimes, but rank-order consistency across the full distribution does not. Investors using the framework during extended bull markets should calibrate expectations for attenuated cross-sectional signal while recognizing the structural condition is still present and measurable.
The framework does not incorporate sector context.
A pre-revenue biotech and a pre-profitable consumer staples company receive equivalent OAL rung assignments. Both are measured against the same structural standard — what has this company actually sustained over seven years — regardless of the structural differences in their industries. Sector context can and should be applied as a layer of interpretation on top of the composite score. It is not embedded in the framework.
The seven-year window is more demanding — and more conservative.
Under the prior three-year specification, the majority of the scored universe qualified for the FCF anchor rung. Under the seven-year specification, the universe distributes more broadly — fewer companies have sustained positive free cash flow across a full seven-year trailing window, and more qualify at net income. The longer window requires more sustained performance, which is the point. But it also means the framework is slower to recognize genuine improvement when a company's operational trajectory changes materially. That lag is a design choice with a cost.
Score accuracy varies by data quality at the rung level.
Scores derive from Financial Modeling Prep API data, which has known quality issues including malformed rows, gaps in quarterly history for smaller companies, and occasional stale figures. The seven-year trailing window amplifies this risk: a data gap that would have been inconsequential over three years can affect rung assignment over seven. Score accuracy for companies near the boundary between OAL rungs should be treated with interpretive caution, particularly for smaller companies where quarterly data coverage is thinner.
The structural premium decomposes into two independent contributions, each verifiable on its own terms.
The first is exclusion. A cap-weighted broad market portfolio that simply removes companies scoring in the High or Very High composite buckets — no positive selection, no concentration — produces a materially different return and risk profile than the unfiltered universe. Under the seven-year anchor specification, the OSMR-Filtered benchmark returned +15.4% annualized against the full universe at +9.8%. The exclusion effect added a difference of +5.6 percentage points. Maximum drawdown improved from −26.4% to −21.7%. The Sortino ratio improved from 0.843 to 1.444 — a material improvement in risk-adjusted performance achieved through exclusion alone, without any positive selection or concentration.
The second is selection. Concentrating in the Very Low composite bucket — equal-weighted — adds a further premium above the exclusion-only result. The Very Low equal-weight index returned +18.9% annualized post-2013, with a Sortino ratio of 1.317 and maximum drawdown of −34.0%. The selection effect is real but carries a deeper drawdown than the exclusion-only result, due to its equal-weight construction and small-cap tilt.
Together the two effects produce the total structural premium versus the unfiltered universe. The exclusion effect is the more durable and more actionable result for a long-only investor — it requires only that the highest-risk bucket is correctly identifying structural fragility, and the data confirms it does.
Annualized return comparison · equal-weight and cap-weight indexes · post-2013
| Index | Ann Ret | Sortino | Max DD | Months | |
|---|---|---|---|---|---|
| Very Low Risk (EW, post-2013) | +18.9% | 1.317 | −34.0% | 145 | |
| OSMR-Filtered Broad Market (CW) | +15.4% | 1.444 | −21.7% | 145 | exclusion only |
| Full Scored Universe (CW) | +9.8% | 0.843 | −26.4% | 145 | no filter |
| SPY (external reference, approx.) | +14–16% | ~1.1 | ~−34% | — | cap-weighted large-cap |
Very Low index: equal-weight, post-2013 (avg. ~123 constituents/month). Full-period figure includes 2009–2013 recovery with thinner constituent counts. Post-2013 (+18.9%) is the appropriate reference. OSMR-Filtered: cap-weight, excludes axis1 or axis2 in {High, Very High}, post-2013. Transaction costs and market impact not modeled.
| Bucket | Median | Geo Mean | CVaR (95%) | <−25% | N |
|---|---|---|---|---|---|
| Very Low | +11.4% | +10.0% | -50.4% | 10.5% | 18,137 |
| Low | +10.0% | +7.8% | -53.4% | 11.8% | 73,478 |
| Moderate | +9.5% | +5.8% | -60.4% | 14.2% | 114,983 |
| High | +7.8% | +1.4% | -69.9% | 17.0% | 51,952 |
| Very High | -9.6% | -21.4% | -88.9% | 39.6% | 26,695 |
CVaR (95%): average loss in the worst 5% of 12-month outcomes. "<−25%": percentage of observations with 12-month return below −25%.
CVaR (95%) by composite bucket · average loss in worst 5% of 12-month windows
Very High bucket: 39.6% of observations ended below −25% over the subsequent 12 months. Very Low bucket: 10.5%.
% of observations in each return band · Very Low: N=18,137 · Very High: N=26,695
39.6% of Very High composite observations produced losses exceeding 25% over 12 months. CVaR of −88.9% means that in the worst 5% of outcomes, the average loss approached near-total drawdown. The Very High mean return of approximately +9.5% — pulled upward by a small subset of large positive outcomes — understates the typical outcome by a substantial margin. Median and geometric mean are the correct measures for structural risk evaluation.
Design decisions
Equal weighting over empirically-derived weighting
The composite weights Axis 1 and Axis 2 equally at 50% each. Empirical testing shows that tilting the weight toward Axis 2 improves composite r by approximately 0.007 — a difference that is within bootstrap confidence interval overlap and does not justify abandoning interpretive symmetry. Both axes capture distinct structural dimensions; claiming one deserves 70% weight requires stronger theoretical justification than the data currently provides.
Simplest method that works over mathematical refinement
Where simpler measures outperform complex ones against historical data, the simpler measure is used. Axis 2 uses YoY consistency rather than tanh transformation or R-squared trend fitting — both of which were tested and underperformed. Mathematical complexity is not a virtue when it disconnects the measure from the phenomenon it captures.
Global ranking for cross-sectional comparability
Both Axis 1 and Axis 2 are expressed as global percentile ranks across the full universe after anchor penalties are applied. This makes scores directly comparable across companies anchored on different OAL rungs — essential for a framework designed to evaluate structural risk at the universe level.
Conservative anchor assignment on a seven-year trailing window
OAL assignment uses the 7-year cumulative sum of the relevant financial series. This requires sustained demonstrated performance rather than rewarding a single strong quarter or recent trend. A company must demonstrate positive 7-year cumulative FCF to qualify for OAL 1. Adjusted metrics, normalized earnings, and forward projections are not considered.
Static weights across market environments
The composite formula and axis weights do not shift based on market environment or detected volatility regime. Signal strength varies materially across regimes — as the validation section documents — and users operating in a specific environment should apply judgment accordingly. But a model that silently changes its weights is harder to interpret, harder to audit, and more susceptible to overfitting. Stability is a deliberate choice, not an oversight.
The structural case for narrative risk as a measurable, manageable condition in equity portfolios rests on the evidence in this document — not on assertion. The framework identified the structural condition that preceded significant losses across 17 years and three market regimes. The factor-adjusted alpha is positive and statistically significant across all three market regimes. The premium decomposes into two independently verifiable mechanisms. The limitations are named, not footnoted.
What the evidence supports is a specific and bounded claim: that the distance between what a company has built and what its valuation requires is measurable, that it predicts return distributions in a consistent direction, and that managing exposure to the highest-risk tail of that distribution produces a measurably different portfolio. Not a guarantee. Not a timing signal. A structural lens that makes a specific kind of risk visible before it resolves.
The structural map is live.
Open Platform →Appendix — Methodology specification
Backtest parameters
285,245 observations · 2009-01 through 2025-02 formation window · 12-month forward return horizon · $5 minimum price filter at formation · bootstrap 95% confidence intervals (1,000 samples).
OAL assignment
Each company evaluated FCF → NI → EBIT → Revenue in sequence; assigned to first qualifying rung based on positive 7-year cumulative figure. Negative EV companies excluded. No adjusted or normalized metrics considered.
Axis 1 formula
log(EV / anchor_7yr) + OAL_shallowness_penalty → global percentile rank (ascending = more risk). Anchor: 7-year cumulative sum of OAL-appropriate series (28 quarters). Penalties: FCF=0, NI=1.0, EBIT=1.618, Rev=4.236.
Axis 2 formula
YoY consistency score across up to 28 quarters of OAL-anchor history + OAL_shallowness_penalty → global percentile rank. Four most recent YoY comparisons receive double weight. Same penalties as Axis 1.
Composite formula
(axis1_pct + axis2_pct) / 2. Equal weight. Expressed as percentile rank across full universe. Long-run static model — weights do not shift across market regimes. Axis 3 is a contextual disclosure layer, not included in composite.
Factor test
Fama-French 5 factors + Momentum (Ken French data library). OLS with Newey-West HAC standard errors (3-lag). Four portfolio series: Very Low EW, Very High EW, L/S (Very Low minus Very High), Broad EW. 205 months.
Classification test
Loss event = 12-month forward return < −25%. Base rate: 16.24%. OSMR AUC: 0.6174 vs momentum baseline 0.3996 vs valuation screen 0.5957 vs random 0.500. AUC computed via trapezoidal rule. Very High bucket: 39.6% of observations below −25%. Very Low: 10.5%. Relative risk: 2.44×.
Index construction
Equal-weight (EW): simple average of 1-month returns across bucket constituents, monthly rebalancing. Cap-weight (CW): market-cap weighted average. Price filter ≥$5 applied at formation. Transaction costs not modeled.
The Capital Steward, LLC · thecapitalsteward.com
© 2026 The Capital Steward, LLC. For informational purposes only. Not investment advice. Past performance of a backtest does not guarantee future results.