I Built a Financial Defense System With AI Agents During a Real Market Crisis

On March 27, 2026, the financial report I built with an AI agent was showing 100/100 CRITICAL risk. Seven critical signals. Fifty-two warnings. The Strait of Hormuz was closed. Brent crude had hit $104. The VIX crossed 30. Consumer confidence was at 56.6 — recessionary territory.

This wasn’t a demo. My 401(k) was real. And I needed to make a decision: pull money out and eat the penalty, or ride it out and hope.

The report that started as a learning project became personal

A few weeks earlier, I’d built a daily AI-powered market report as a PM-who-builds exercise. It pulls live data from yfinance (80+ tickers), the FRED API (12 macroeconomic series), and runs a multi-layer risk scoring engine — technical signals, macro indicators, fundamentals — weighted so leading indicators count 1.5x over lagging ones. It generates a static HTML report, deploys to GitHub Pages via GitHub Actions, and rebuilds automatically.

I built it to learn about financial markets. Then the Iran conflict escalated. Oil supply dropped by 8 million barrels per day — the IEA called it the largest disruption in history. The S&P entered correction territory. Twenty-two stocks in my watchlist hit death crosses. Eighteen had deteriorating EPS estimates.

The report went from “interesting project” to “the tool I check before my morning coffee.” And on March 27, it was screaming.

The 401(k) question nobody’s AI assistant can answer

At 54, facing a potential market crash, the math on an early 401(k) withdrawal looks like this:

10% early withdrawal penalty (under 59½)
~29% federal + state income tax (the withdrawal counts as ordinary income)
Total cost: ~39% of whatever you pull out

A $100,000 withdrawal nets you roughly $61,000 in cash. And that $100,000, if left invested, compounds to roughly $200,000 by age 65 at historical average returns.

So the breakeven question is: does the market drop more than 39%, and then fail to recover within your investment horizon? That’s the only scenario where early withdrawal wins.

I built the math. Researched every major US crash from 1907 to 2020. Mapped the percentages, the recovery times, the conditions.

Historical verdict: early withdrawal was correct in 1 of 8 major crashes. Only 1929 — and that was under conditions that can’t recur: no FDIC, no SEC, no Federal Reserve backstop. The banking system itself failed. Every other crash — 1973 oil shock, 1987 Black Monday, 2000 dot-com, 2008 financial crisis, 2020 COVID — recovered within 2 to 7 years while early withdrawers locked in permanent losses.

There was one more factor. The Rule of 55: if you leave your employer in or after the calendar year you turn 55, you can withdraw from that employer’s 401(k) without the 10% penalty. I was 9 months away. That single rule would save roughly $20,000 on a $200,000 withdrawal.

The verdict: defensive rebalance within the 401(k), not withdrawal. Move from stocks to bonds, stable value funds, or money market — inside the tax shelter. It costs $0. It’s reversible. And it protects against the downside without the 39% haircut.

Going deeper: what the oil price didn’t tell us

The initial analysis focused on oil prices and market numbers. Then I challenged myself: “We need to make our understanding more robust, more 360 degree.” That challenge changed everything.

The Strait of Hormuz isn’t just an oil chokepoint. It’s the transit route for LNG, helium, fertilizer, petrochemicals, and containerized goods serving half the global economy. The full supply chain cascade:

Helium — Iranian strikes physically destroyed 14% of Qatar’s Ras Laffan helium capacity, the world’s largest facility. 33% of global helium supply went offline. Helium is irreplaceable in semiconductor manufacturing — no viable substitute at required purity. TSMC Arizona reported less than 45 days of reserves. Reconstruction timeline: 3 to 5 years.

LNG — 35 million tons of liquefied natural gas lost this year. Qatar declared force majeure on long-term contracts for up to 5 years. Asia LNG prices up 143%.

Fertilizer — One-third of global fertilizer exports trapped in the Gulf during Northern Hemisphere spring planting. Qatar’s world-largest urea plant shut down.

Pharmaceuticals — 50% of US generic prescriptions come from India, which depends on the Strait for inputs and shipping. Drug cost increases projected within 4-6 weeks.

Shipping — 3,200 ships trapped. War risk insurance up 300%. Container surcharges $500-1,500 per box.

The key insight that changed my analysis: even if fighting stops tomorrow, the physical destruction of Ras Laffan means 3 to 5 years of reduced helium and LNG supply. This is structural damage, not a temporary disruption. The semiconductor, AI, and energy industries will feel this through 2028-2031.

I built an ACTION_TRIGGER_FRAMEWORK.md with 10 concrete if/then triggers — each one tied to a specific data point (VIX level, oil price, S&P drawdown, consumer confidence threshold) and a specific action (rebalance, escalate, hold). No ambiguity. When X crosses Y, do Z.

From research to running code

Research documents are useful. Running code is better.

I built a Personal Defense Dashboard — a local-only Python module at financial-agent/src/personal/ that evaluates all 10 triggers against live market data and renders a Rich terminal UI with the results.

The architecture:

triggers.py — encodes all 10 triggers (A through J) as executable rules. Four auto-evaluate from live market data (VIX, oil, S&P drawdown, consumer sentiment). Six use manual flags from a JSON config that I update when reading the news (helium supply, LNG status, fertilizer impact, shipping disruption, pharma risk, geopolitical escalation).
calculator.py — parameterized 401(k) model: withdrawal cost at any amount, Rule of 55 comparison, loan option, SECURE 2.0 provisions, breakeven analysis against live S&P, compounding cost through age 65, federal tax bracket math.
convergence.py — counts active and severe triggers, determines Level 0 through 3 (Calm, Caution, High Alert, Crisis), generates a situational recommendation.
dashboard.py — Rich terminal dashboard: fetches live data, evaluates all triggers, runs the 401(k) math, displays a convergence banner, trigger table, decision math, supply chain cascade position, and a plain-text recommendation.

First live run: Level 3 CRISIS. Nine of ten triggers active. Four severe.

And the verdict was still the same: stay invested. Defensive rebalance, not withdrawal. The math doesn’t change just because the dashboard is red. The breakeven gap was +32 percentage points — the S&P would need to fall to roughly 4,200 for withdrawal to match the penalty cost. That’s a 34% crash from current levels, sustained long enough to eliminate recovery.

The architecture decision that mattered most: automate what you can, make manual input easy for the rest. Market data flows in automatically. But supply chain triggers — helium reserves, shipping insurance rates, fertilizer availability — have no free API. So they live in a signals.json file that takes 30 seconds to update when I read something new. Perfect automation would have delayed the build by weeks. Good-enough automation shipped in a day.

When the model said STABLE at 350

A week later, I noticed the risk score had been dropping — 744 to 287 over a few days — even though 26 death crosses, 18 deteriorating EPS estimates, and 37 insider selling signals were still active. Something was wrong.

The diagnosis: a structural flaw in how the engine weighted signals. Three stocks having a bad week contributed 75 points. All structural breadth signals combined — death crosses, EPS deterioration, insider selling across the entire watchlist — contributed 74 points. Per-ticker volatility was drowning out market-wide deterioration.

But the worse bug was in the forward projection. The old compute_projection() function said “STABLE” at a score of 350. Three root causes:

It only scanned 5 hardcoded FRED series — most macro indicators were invisible
The threshold required 3+ warnings to register anything; single critical signals scored zero
No absolute-level awareness — it only looked at deltas, never at “we’re already in crisis territory”

If you’re at 350 and drop to 300, that’s still a crisis. But a delta-only model calls it “improving” or “stable.”

The fix: scan all macro indicators, lower the threshold so single critical signals register, and add an absolute-level floor — if the score is above 200, the projection starts from “stressed” not “neutral.” Context-aware labels: “STRESSED, HOLDING” replaces “STABLE” at high scores. “EASING” replaces “IMPROVING” when you’re easing from severe to merely bad.

Then I asked myself the question that should come first, not last: “Are we following best practices?” This led to a web research sprint comparing our model against the CNN Fear & Greed Index (7 components), the OFR Financial Stress Index (33 variables), and the Chicago Fed NFCI (105 indicators). The research validated some of our design and exposed gaps. We added 6 new signals: percentage of watchlist above the 200-day moving average (the institutional standard for breadth), 52-week highs vs. lows, S&P 500 drawdown from peak, safe-haven rotation, signal convergence amplification, and a new “elevated” severity tier.

The score went from 287 to 376. The +89 points properly reflected structural deterioration that the old model was suppressing.

The broader lesson: any scoring system that only tracks change-over-time will be blind to sustained extremes. A dashboard that says “improving” during a structural crisis because the rate of deterioration slowed is worse than no dashboard at all. It builds false confidence.

What I actually learned

Building for real stakes changes how you build. Not harder — more honest.

You catch the projection bug at 350 because the wrong answer costs you money, not just credibility. You push past the initial oil analysis into helium, LNG, and fertilizer because your 401(k) isn’t diversified enough to ignore semiconductor supply chains. You add 10 triggers instead of 3 because the real question isn’t “is the market bad?” — it’s “is it bad enough, in the specific ways that matter to me, to justify an irreversible decision?”

AI agents are most useful in the gap between spreadsheet and financial advisor. A spreadsheet can’t synthesize 40+ research sources, run live API calls, evaluate triggers against real-time data, and generate a contextual recommendation. A financial advisor charges $200-500/hour for generic advice that doesn’t know my specific 401(k) balance, tax bracket, or Rule of 55 timeline.

The combination of domain research + live data + personal parameters + AI analysis produced something no single tool could: a decision framework calibrated to my specific situation, running against live data, with clear triggers for when the plan should change.

What I’d tell someone facing a similar question: Don’t withdraw. Do rebalance within the tax shelter. Do build the triggers so you know when your plan should change — not from gut feel, but from specific thresholds tied to specific data. And remember the historical base rate: early withdrawal was correct in 1 of 8 major crashes, under conditions that can’t recur.

The dashboard still shows Level 3 CRISIS. Nine triggers active. The verdict hasn’t changed. But I’m not guessing anymore.

The public market report is live at vcrosby22.github.io/financial-reports. The personal defense dashboard is local-only — no financial data leaves my machine. For how the project started: “Why Product Managers Should Build Things”. Built with Cursor. Validated against my own 401(k).