# Build a pairs trading strategy with Python

# Build a pairs trading strategy with Python

In today’s issue, I’m going to show you how to build a pairs trading strategy in Python.

Pairs trading (sometimes called statistical arbitrage) is a way of trading an economic relationship between two stocks. For example, two companies that manufacture a similar product with the same supply chain will be impacted by the same economic forces. Pairs trading tries to model that relationship and make money when the relationship temporarily breaks down.

Pairs trading relies on cointegration. Cointegration is a statistical method to test the stationarity between two-time series. Stationarity describes a time series that has no trend, a constant variance through time, and no seasonality. The “pair” is a linear combination of both stocks: one you buy and one you sell.

### Pairs trading exploits periodic breakdowns in economic relationships.

An ideal pairs trading scenario is when two stocks are cointegrated. In other words, there is a stable linear combination between them. The strategy will enter trades if that relationship breaks down.

The secret to pairs trading is picking the right pairs to trade. To do this, traders start with buckets of stocks that are related economically. Then they use big data sets to crunch through millions of pairs to find anomalies to exploit.

And most of the time they use Python.

By reading this issues, you’ll be able to:

- Get stock price data
- Find cointegrated pairs
- Model the spread
- Trade the strategy

Let’s get started.

### Step 1: Get the data

Start by importing the libraries. statsmodels is a package used to build statistical models like linear regression and tests for cointegration. seaborn is a plotting library.

```
1import numpy as np
2import pandas as pd
3
4import statsmodels.api as sm
5from statsmodels.tsa.stattools import coint
6from statsmodels.regression.rolling import RollingOLS
7
8import yfinance as yf
9import seaborn
10import matplotlib.pyplot as plt
```

Next, get the data. Picking the pairs to test is the secret behind a good pairs trading strategy. For this example, I just use the FAANG stocks.

```
1symbol_list = ['META', 'AMZN', 'AAPL', 'NFLX', 'GOOG']
2data = yf.download(
3 symbol_list,
4 start='2014-01-01',
5 end='2015-01-01'
6)['Adj Close']
```

### Step 2: Find co-integrated pairs

The next step is to loop through the different combinations of pairs to test if they’re co-integrated (see Warning below).

```
1def find_cointegrated_pairs(data):
2 n = data.shape[1]
3 score_matrix = np.zeros((n, n))
4 pvalue_matrix = np.ones((n, n))
5 keys = data.keys()
6 pairs = []
7 for i in range(n):
8 for j in range(i+1, n):
9 S1 = data[keys[i]]
10 S2 = data[keys[j]]
11 result = coint(S1, S2)
12 score = result[0]
13 pvalue = result[1]
14 score_matrix[i, j] = score
15 pvalue_matrix[i, j] = pvalue
16 if pvalue < 0.05:
17 pairs.append((keys[i], keys[j]))
18 return score_matrix, pvalue_matrix, pairs
```

This function loops through a list of securities and tests for cointegration between all pairs. It returns a cointegration test score matrix, a p-value matrix, and any pairs for which the p-value was less than 0.05.

Next, run the function on our stock data.

`1scores, pvalues, pairs = find_cointegrated_pairs(data)`

And plot the results on a heat map.

```
1seaborn.heatmap(
2 pvalues,
3 xticklabels=symbol_list,
4 yticklabels=symbol_list,
5 cmap='RdYlGn_r',
6 mask = (pvalues >= 0.05)
7)
```

It looks like AMZN and AAPL are co-integrated!

### Step 3: Model the spread

Now that you found a pair, run a linear regression using statsmodels and model the spread as a linear combination of AAPL and AMZN. b is the beta coefficient from the linear regression, otherwise known as the “hedge ratio.”

```
1S1 = data.AMZN
2S2 = data.AAPL
3
4S1 = sm.add_constant(S1)
5results = sm.OLS(S2, S1).fit()
6S1 = S1.AMZN
7b = results.params['AMZN']
8spread = S2 - b * S1
```

Now plot it.

```
1spread.plot()
2plt.axhline(spread.mean(), color='black')
3plt.legend(['Spread']);
```

### Step 4: Build a simple trading strategy

You buy the spread when it gets “too low” and sell the spread when it gets “too high.” But what is too low and too high? Use the z-score to normalize the spread and use it as the trade signal. If you buy the pair, you buy AAPL and sell b shares of AMZN. If you sell the pair, you sell AAPL and buy b shares of AMZN.

```
1def zscore(series):
2 return (series - series.mean()) / np.std(series)
3
4zscore(spread).plot()
5plt.axhline(zscore(spread).mean(), color='black')
6plt.axhline(1.0, color='red', linestyle='--')
7plt.axhline(-1.0, color='green', linestyle='--')
8plt.legend(['Spread z-score', 'Mean', '+1', '-1']);
```

First, build a function that plots the z-score. If you’re unfamiliar with the z-score, you can learn more about it here.

Next, estimate the equity curve of buying and selling the pair.

```
1# Create a DataFrmae with the signal and position size in the pair
2trades = pd.concat([zscore(spread), S2 - b * S1], axis=1)
3trades.columns = ["signal", "position"]
4
5# Add a long and short position at the z-score levels
6trades["side"] = 0.0
7trades.loc[trades.signal <= -1, "side"] = 1
8trades.loc[trades.signal >= 1, "side"] = -1
```

First, create a DataFrame with the signal and the position in the pair. Then add a column to the DataFrame and populate it with a 1 when the signal is less than or equal to -1 and -1 when the signal is greater than or equal to 1.

Finally, plot the equity curve.

```
1returns = trades.position.pct_change() * trades.side
2returns.cumsum().plot()
```

This pair is consistently losing money. This could mean a few things. First, there is no linear combination that is stationary and we have a false positive. Or, there is no real economic relationship that drives the two stocks. Or, there is a linear combination and there is an economic relationship, but the relationship continued to break down during the frame you used.

### Warning:

This is a toy example and there are some important caveats to note:

- Pairs trading assumes stock prices are cointegrated. Technically, that means that a linear combination of prices varies around a stable mean on the same distribution. In practice, traders use Augmented Dickey-Fuller tests, Hurst exponents, and Kalman filters to test for cointegration.
- Looping through pairs to find p-values increases the likelihood of incorrectly finding a significant p-value when many tests are run (a false positive). Start with the economic rationale of why two stocks should be cointegrated. In practice, traders pick a handful of stocks with economic links and test those.

Well, that's it for today. I hope you enjoyed it.

See you again next week.