How Backtesting Really Works (And Why Most Bots Fake It) -- tbot.team

What Backtesting Actually Is

Backtesting means running a trading strategy against historical market data to see how it would have performed. Instead of risking real money to find out if a strategy works, you simulate thousands of trades using past price data and measure the results.

The concept is straightforward: if a strategy would have lost money over the last 12 months of historical data, it probably shouldn't be deployed with real capital. But the devil is in the implementation -- and that's where most trading bots cut corners.

The Candle-Close Problem

Most trading bots backtest against candle-close data only. For example, if you're testing on 1-hour candles, the bot checks the closing price of each hourly candle and decides whether a trade would have triggered.

Here's the problem: a 1-hour candle hides everything that happened within that hour. If BTC dropped 3% mid-candle before recovering to close flat, any stop-loss set at -2% would have triggered in real trading -- but the backtest running on candle-close data never sees that intracandle wick.

The result? Backtests that look profitable on paper but would have been stopped out repeatedly in live trading. Win rates appear inflated, drawdowns look smaller than reality, and traders deploy strategies based on numbers that never existed.

How Proper Backtesting Works

Correct backtesting builds higher-timeframe candles from granular base data -- ideally 1-second tick data or at minimum 1-minute data. When testing a strategy on 1-hour candles, the engine reconstructs each hourly candle from its underlying 3,600 one-second data points.

This approach captures:

Intracandle stop-loss triggers -- wicks that would have hit your stop before the candle closed
Intracandle take-profit fills -- price briefly touching your target before reversing
Order fill sequence -- whether your entry or exit would have actually filled at the expected price given the intracandle price action
Realistic slippage -- the gap between your intended fill price and the actual execution price during fast-moving markets

The difference is significant. A strategy that shows 68% win rate on candle-close backtesting may drop to 52% when tested with intracandle data -- the difference between a profitable strategy and a losing one after fees.

The Numbers: Backtests vs Reality

A study analyzing 888 algorithmic trading strategies found that backtested Sharpe Ratios had an R² of less than 0.025 when predicting live performance. In plain English: the backtest results explained less than 2.5% of the variation in real-world returns.

This doesn't mean backtesting is useless -- it means most backtesting is done poorly. The gap between backtest and reality comes from:

Candle-close-only testing (missing intracandle moves)
No fee simulation
No slippage modeling
Overfitting to historical patterns
Survivorship bias in strategy selection

The Fee Trap Nobody Talks About

Fees are the silent strategy killer. On Binance, maker/taker fees with BNB discount run approximately 0.075% per side -- that's 0.15% for a complete roundtrip (buy + sell).

Now do the math: if your strategy averages 0.2% profit per trade before fees, you're giving up 75% of your gross profit to exchange fees alone. Your 0.2% winner becomes a 0.05% winner. Over hundreds of trades, this compounds into a massive performance drag.

Any backtest that doesn't simulate fees is fantasy. And yet many bots either exclude fees entirely from their backtesting engine or use unrealistically low fee assumptions.

Overfitting: The Backtest That's Too Good

Overfitting happens when a strategy is tuned so precisely to historical data that it "memorizes" past market patterns instead of identifying general rules. The backtest shows spectacular results, but the strategy fails immediately in live trading because it was optimized for conditions that won't repeat exactly.

Signs of an overfitted strategy:

Extremely high win rate (>80%) combined with high trade frequency
Strategy uses many parameters (10+) that are all finely tuned
Performance drops sharply when tested on a different time period
Results vary wildly with small parameter changes

The antidote is out-of-sample testing: optimizing a strategy on one time period (the "in-sample" data) and then verifying it on a completely separate period (the "out-of-sample" data) that was never used during optimization.

The 5 Metrics That Actually Matter

Win Rate

Percentage of trades that close in profit. Meaningless in isolation -- a 90% win rate with a 10:1 loss-to-win ratio is a losing strategy. Always evaluate alongside average win size vs average loss size.

Profit Factor

Gross profit divided by gross loss. A profit factor of 1.5 means the strategy earns $1.50 for every $1 it loses. Below 1.0 means the strategy is losing money. Above 2.0 is considered strong.

Maximum Drawdown

The largest peak-to-trough decline during the backtest period. A strategy with 60% maximum drawdown means your account could drop from $10,000 to $4,000 before recovering. This is the metric most bots conveniently omit.

Annualized Sharpe Ratio

Risk-adjusted return. Measures excess return per unit of risk. Important: the Sharpe Ratio must be annualized correctly based on the trading timeframe. A Sharpe of 2.0 on daily data looks different than 2.0 on hourly data. Anything above 1.0 after fees is respectable for crypto.

Longest Losing Streak

The maximum number of consecutive losing trades. Critical for psychological endurance and risk management. A strategy with a 15-trade losing streak requires conviction and proper position sizing to survive.

Red Flags vs Green Flags

Red Flags

No slippage model included
No fee calculation in results
Only cherry-picked profitable examples shown
Maximum drawdown not reported
No mention of testing methodology
Sharpe Ratio not annualized or timeframe not specified

Green Flags

Sharpe Ratio annualized with timeframe noted
Maximum drawdown clearly reported
Fee simulation with realistic exchange rates
Slippage modeling included
Out-of-sample testing results shown
Both winning and losing periods displayed

The Bottom Line

Backtesting is essential -- but only when done correctly. A backtest without intracandle data, fee simulation, and slippage modeling is closer to fiction than prediction. Before trusting any bot's backtest results, ask: what data resolution was used, are fees included, and is there out-of-sample validation?

The bots that are transparent about their backtesting methodology -- including its limitations -- are the ones worth taking seriously. If a bot only shows you winning backtests with no drawdown data, that tells you everything you need to know.

← All Articles Next: The Real Cost of Trading Bots →

How Backtesting Really WorksAnd Why Most Bots Fake It