This post describes how to use the Python package
zipline to backtest an equities trading strategy. Backtesting is the practice of evaluating a trading model or heuristic against past data. Although past performance is no guarantee of future performance, practitioners generally feel more confident in strategies which would have performed well had they been executed in the past (given information available at the time). Another excellent reason to use backtesting is to catch conspicuously bad mistakes (e.g., coding or math errors) that are likely to guarantee poor performance in the future.
This post covers the basics of using
zipline in the context of trading equities. A future post will discuss using
zipline to backtest a prediction markets trading strategy.
These instructions assume you are using zipline version 1.3.0.
zipline involves two distinct steps:
- Loading data
- Running a strategy which uses the data
Loading data from CSV
zipline can load arbitrary price data from a CSV file provided the price data is formatted in a specific way. Fortunately, the required format is relatively common. The format is called OHLCV and has eight columns: date, open, high, low, close, volume, split, and dividend. Do not worry about the split and dividend columns. They will be uniformly 0.0 and 1.0, respectively. The remaining columns are self-explanatory. Here is an example line in the file
2012-01-03,58.485714,58.92857,58.42857,58.747143,75555200,0.0,1.0. The symbol associated with the price data is taken from the filename. In this case, the symbol is AAPL.
To make the data available to zipline for backtesting, we load (“ingest”) the data by putting the file into a
daily subdirectory inside a directory and running the following command:
CSVDIR=/path/to/data python3 -m zipline ingest -b csvdir
/path/to/data is replaced with the appropriate directory. If
/path/to/data is our directory, then
AAPL.csv.gz is located at
/path/to/data/daily/AAPL.csv.gz. (AAPL.csv.gz can be found in the zipline repository.) Note the required
daily subdirectory. If you have done everything correctly then your command should be greeted with a message which begins with
Loading custom pricing data: ....
Testing a Trading Strategy
Now that we have loaded historical price data we can test trading strategies.
Let’s test the following “strategy”: buy 10 shares of stock on Mondays, sell 10 shares on Wednesday. (Who knows, maybe we’ve discovered that people are pessimistic on Mondays and tend to systematically undervalue equities.)
To implement this strategy we need to figure out how to cast it into terms used by the zipline API. Fortunately for us the strategy is simple, so this will not be difficult. We know that zipline requires us to write a function which gets called every day,
handle_data, where this function can buy and sell things. We can buy and sell shares with the
order function. The difficult part here is querying zipline for the date(time), from which we calculate the day of the week. To retrieve the current datetime in our called-every-day function, we need to inspect an attribute,
current_dt, of the
data argument (an instance of
BarData) which gets passed to our function. When dealing with daily data, as we are here, this datetime is the end of the current trading period for that day. Since AAPL is traded in the United States and trading closes at 16:00 New York time, an exemplary
data.current_dt would be
Let’s make a first attempt at writing a strategy. We will start with an empty
initialize function (which zipline requires we define) and then craft our
handle_data function. Here it is:
from zipline.api import order, record, symbol def intitialize(context): pass def handle_data(context, data): # weekday is a method of `datetime.datetime`, Mon = 0, Fri = 4 weekday = data.current_dt.weekday() if weekday == 4: # place order on Friday for Monday execution order(symbol('AAPL'), 10) elif weekday == 1: # place order on Tuesday for Wednesday execution order(symbol('AAPL'), -10) record(AAPL=data.current(symbol('AAPL'), 'price'))
If we save this code in a file
strategy1.py we can backtest it over any period for which we have data. For example, we can test this minimal strategy over the first full trading week in January 2012 with:
The output of a zipline run is a pandas data frame which can be read with the
pandas.read_pickle function. The following lines of code will open the data frame and print the following:
- The opening value of our portfolio (i.e., the cash we start with)
- The closing price of AAPL on Monday (at which we buy 10 shares)
- The closing price of AAPL on Wednesday (at which we sell 10 shares)
- The closing value of our portfolio
Opening `portfolio_value`: 10000000.0 AAPL on Monday: 60.247 AAPL on Wednesday: 60.364 Closing `portfolio_value` 10000000.5469
So in particular one week we’ve earned a profit of 0.55 having made use of 602.247 (\(10 \times 60.247\)) (ignoring fees and slippage). In annualized terms this is a return of about 2.8% which is better than the risk-free rate in early 2012 (around 1.9%). Of course we only backtested the algorithm for one week. (Our strategy is also very risky.) To run this strategy for an entire year we would change the end date:
zipline run -f strategy1.py -b csvdir --start 2012-1-6 --end 2012-1-28 -o strategy1_out.pickle.
Using zipline in a prediction market setting
zipline in a prediction market setting one needs correctly formated data. Obtaining current price information, including the bid-ask spread, is typically not difficult. PredictIt, for example, makes this data available in a machine-readable format (e.g., https://www.predictit.org/api/Market/3633/Contracts).