What Backtesting Actually Is
Backtesting means running a trading strategy against historical market data to see how it would have performed. Instead of risking real money to find out if a strategy works, you simulate thousands of trades using past price data and measure the results.
The concept is straightforward: if a strategy would have lost money over the last 12 months of historical data, it probably shouldn't be deployed with real capital. But the devil is in the implementation -- and that's where most trading bots cut corners.
The Candle-Close Problem
Most trading bots backtest against candle-close data only. For example, if you're testing on 1-hour candles, the bot checks the closing price of each hourly candle and decides whether a trade would have triggered.
Here's the problem: a 1-hour candle hides everything that happened within that hour. If BTC dropped 3% mid-candle before recovering to close flat, any stop-loss set at -2% would have triggered in real trading -- but the backtest running on candle-close data never sees that intracandle wick.
The result? Backtests that look profitable on paper but would have been stopped out repeatedly in live trading. Win rates appear inflated, drawdowns look smaller than reality, and traders deploy strategies based on numbers that never existed.
How Proper Backtesting Works
Correct backtesting builds higher-timeframe candles from granular base data -- ideally 1-second tick data or at minimum 1-minute data. When testing a strategy on 1-hour candles, the engine reconstructs each hourly candle from its underlying 3,600 one-second data points.
This approach captures:
- Intracandle stop-loss triggers -- wicks that would have hit your stop before the candle closed
- Intracandle take-profit fills -- price briefly touching your target before reversing
- Order fill sequence -- whether your entry or exit would have actually filled at the expected price given the intracandle price action
- Realistic slippage -- the gap between your intended fill price and the actual execution price during fast-moving markets
The difference is significant. A strategy that shows 68% win rate on candle-close backtesting may drop to 52% when tested with intracandle data -- the difference between a profitable strategy and a losing one after fees.
The Numbers: Backtests vs Reality
A study analyzing 888 algorithmic trading strategies found that backtested Sharpe Ratios had an R² of less than 0.025 when predicting live performance. In plain English: the backtest results explained less than 2.5% of the variation in real-world returns.
This doesn't mean backtesting is useless -- it means most backtesting is done poorly. The gap between backtest and reality comes from:
- Candle-close-only testing (missing intracandle moves)
- No fee simulation
- No slippage modeling
- Overfitting to historical patterns
- Survivorship bias in strategy selection
The Fee Trap Nobody Talks About
Fees are the silent strategy killer. On Binance, maker/taker fees with BNB discount run approximately 0.075% per side -- that's 0.15% for a complete roundtrip (buy + sell).
Now do the math: if your strategy averages 0.2% profit per trade before fees, you're giving up 75% of your gross profit to exchange fees alone. Your 0.2% winner becomes a 0.05% winner. Over hundreds of trades, this compounds into a massive performance drag.
Any backtest that doesn't simulate fees is fantasy. And yet many bots either exclude fees entirely from their backtesting engine or use unrealistically low fee assumptions.
Overfitting: The Backtest That's Too Good
Overfitting happens when a strategy is tuned so precisely to historical data that it "memorizes" past market patterns instead of identifying general rules. The backtest shows spectacular results, but the strategy fails immediately in live trading because it was optimized for conditions that won't repeat exactly.
Signs of an overfitted strategy:
- Extremely high win rate (>80%) combined with high trade frequency
- Strategy uses many parameters (10+) that are all finely tuned
- Performance drops sharply when tested on a different time period
- Results vary wildly with small parameter changes
The antidote is out-of-sample testing: optimizing a strategy on one time period (the "in-sample" data) and then verifying it on a completely separate period (the "out-of-sample" data) that was never used during optimization.
The 5 Metrics That Actually Matter
Win Rate
Percentage of trades that close in profit. Meaningless in isolation -- a 90% win rate with a 10:1 loss-to-win ratio is a losing strategy. Always evaluate alongside average win size vs average loss size.
Profit Factor
Gross profit divided by gross loss. A profit factor of 1.5 means the strategy earns $1.50 for every $1 it loses. Below 1.0 means the strategy is losing money. Above 2.0 is considered strong.
Maximum Drawdown
The largest peak-to-trough decline during the backtest period. A strategy with 60% maximum drawdown means your account could drop from $10,000 to $4,000 before recovering. This is the metric most bots conveniently omit.
Annualized Sharpe Ratio
Risk-adjusted return. Measures excess return per unit of risk. Important: the Sharpe Ratio must be annualized correctly based on the trading timeframe. A Sharpe of 2.0 on daily data looks different than 2.0 on hourly data. Anything above 1.0 after fees is respectable for crypto.
Longest Losing Streak
The maximum number of consecutive losing trades. Critical for psychological endurance and risk management. A strategy with a 15-trade losing streak requires conviction and proper position sizing to survive.
Red Flags vs Green Flags
Red Flags
- No slippage model included
- No fee calculation in results
- Only cherry-picked profitable examples shown
- Maximum drawdown not reported
- No mention of testing methodology
- Sharpe Ratio not annualized or timeframe not specified
Green Flags
- Sharpe Ratio annualized with timeframe noted
- Maximum drawdown clearly reported
- Fee simulation with realistic exchange rates
- Slippage modeling included
- Out-of-sample testing results shown
- Both winning and losing periods displayed
The Bottom Line
Backtesting is essential -- but only when done correctly. A backtest without intracandle data, fee simulation, and slippage modeling is closer to fiction than prediction. Before trusting any bot's backtest results, ask: what data resolution was used, are fees included, and is there out-of-sample validation?
The bots that are transparent about their backtesting methodology -- including its limitations -- are the ones worth taking seriously. If a bot only shows you winning backtests with no drawdown data, that tells you everything you need to know.