Backtesting: The Art of Honest Simulation
What You Will Learn
- What backtesting actually proves—and what it cannot prove.
- The core principles for building trustworthy backtests.
- How to bridge the gap between simulation and live trading.
The Core Idea
A backtest is not a crystal ball. It doesn’t tell you the future. It doesn’t prove your strategy “works.”
What it does: test whether your hypothesis held true in historical data, under the conditions you specified.
The right question isn’t “did this strategy make money in the backtest?” It’s “did this strategy behave consistently with my thesis about why it should work?”
A botter treats backtesting as hypothesis validation, not profit prediction. The moment you confuse the two, you’ve already started lying to yourself.
What Backtesting Actually Proves
A backtest proves exactly one thing: given this data, these rules, and these assumptions, here’s what would have happened.
That’s it. Nothing more.
It does not prove:
- That your strategy will make money in the future.
- That the patterns you found are real (vs. noise).
- That your assumptions match reality.
It does prove:
- Whether your rules are internally consistent.
- How your strategy would have responded to specific historical conditions.
- The rough shape of returns, drawdowns, and trade frequency—if your inputs are honest.
The value of backtesting isn’t prediction. It’s stress-testing your logic against known history. If your strategy can’t survive the past, it won’t survive the future.
The Honest Backtest Checklist
Most backtests lie. Not intentionally—but through omission. An honest backtest requires deliberate effort.
Include realistic execution costs. Every trade costs more than the fee. Add spread (the gap between bid and ask), slippage (price movement from your order), and exchange fees. If your backtest assumes perfect fills at the close price, you’re testing a fantasy.
Respect liquidity constraints. If your strategy trades $100K but the historical order book only had $10K at the best price, your backtest is fiction. Large orders move markets. Your simulation should reflect that.
Eliminate look-ahead bias. Can your strategy only work because it “knew” information that wasn’t available at decision time? Using today’s close to make today’s decision is cheating. So is using earnings data before the announcement.
Account for survivorship bias. Your crypto dataset probably only includes tokens that still exist. The ones that went to zero and delisted? Missing. This inflates historical returns because you’re only testing survivors.
Use point-in-time data. Financial data gets revised. GDP numbers change. Even price data can be adjusted retroactively. Your backtest should use what was actually known at the time, not the corrected version.
Data Quality Matters
Garbage in, garbage out. No amount of sophisticated analysis fixes bad data.
Price data inconsistencies. Different exchanges show different prices for the same asset at the same time. Which one did you use? Which one would you actually trade on? Arbitrage opportunities in your backtest might just be data discrepancies.
Missing data and gaps. How does your backtest handle missing candles? Exchange outages? Flash crashes that were later “corrected”? These edge cases often matter more than normal conditions.
Free vs. paid data tradeoffs. Free data is often delayed, incomplete, or poorly maintained. Paid data isn’t perfect either, but it’s usually better. For serious strategy development, data quality is worth paying for.
Cryptocurrency-specific issues. Wash trading inflates volume figures. Exchange-reported prices can be manipulated. Stablecoin depegs create artificial arbitrage. Be skeptical of crypto data—it’s messier than traditional markets.
From Backtest to Live: Bridging the Gap
The gap between backtest and live performance is where strategies go to die. Expect it. Plan for it.
Paper trading: useful but limited. Paper trading tests your execution logic and emotional response, but not real fills. You’ll always get filled in paper trading. Real markets aren’t so generous.
Small-size live testing. The only true test is live trading. Start with position sizes small enough that losses don’t matter. You’re paying tuition to learn how your strategy actually behaves.
Set a slippage budget. Decide in advance: “I expect live performance to be X% worse than backtest due to execution costs I couldn’t fully model.” If live results are worse than your budget, investigate. If they’re better, be suspicious—you might be getting lucky.
Track execution quality. Compare your intended entry price to your actual fill. Do this for every trade. The gap tells you how much your backtest was lying.
When to Trust Your Backtest
Not all backtests are equally trustworthy. Here’s when to have more confidence:
The logic is explainable. You can articulate why the strategy should work, independent of the backtest results. The backtest confirms your thesis; it didn’t generate it.
Parameters are robust. Changing inputs slightly doesn’t dramatically change outputs. If RSI 14 works but RSI 15 fails, you’ve found noise, not signal.
Multiple market regimes tested. Your strategy survived bull markets, bear markets, and sideways chop. It worked in 2020 and 2022, not just one favorable period.
Costs are fully loaded. After subtracting realistic fees, spread, and slippage, profit remains. Many strategies look good until you add costs, then go negative.
Out-of-sample validation passed. You tested on data you didn’t use for development. Once. Without peeking and adjusting. This is covered in depth in Overfitting: The Silent Strategy Killer.
Common Failure Modes
-
Perfecting forever, never going live. Backtesting is seductive. There’s always one more thing to optimize. At some point, you have to trade real money or admit you’re just playing with spreadsheets.
-
Taking backtest results at face value. “The backtest made 200% per year” means nothing without knowing the assumptions. What costs? What slippage? What data quality?
-
Dismissing execution costs as details. “I’ll adjust for that later.” You won’t. And even if you do, the psychology of seeing inflated backtest returns corrupts your judgment.
-
Testing only favorable periods. A strategy developed on 2021 bull market data will look amazing. Test it on 2022. Test it on March 2020. Test it on boring sideways months. Reality includes all conditions.
-
Confusing backtest iteration with research. Running 500 parameter combinations and picking the best one isn’t research—it’s overfitting with extra steps. Each iteration should test a specific hypothesis, not mine for results.