What alpha is really (and how to find it)

I was up late checking another factor signal last night and realized how easy it is to fool yourself with a pretty backtest when you skip the right analysis.

A lot of beginners aren’t sure whether their trading signals really predict returns or just look lucky. It’s tough to figure out which ideas work in real markets and which will fail fast.

In Getting Started With Python for Quant Finance, students learn how to use Python tools like pandas, numpy, Alphalens, and disciplined backtesting so they can actually test factor signals the right way.

They build and validate factors, measure predictive power with real metrics like the information coefficient, and avoid mistakes like overfitting or using the wrong data window.

Today’s newsletter walks through a piece of the end-to-end process my students master in full.

By reading today’s newsletter, you’ll get Python code to measure factor predictive power with the information coefficient and validate real alpha.

Let's go!

What alpha is really (and how to find it)

Factor alpha is the measurable predictive power of a quant trading signal, and the information coefficient (IC) tracks how well your signal lines up with future returns. Using tools like Alphalens in Python makes it easy to measure and validate the actual performance of your factors in practice.

Understanding where the information coefficient comes from is key.

The hunt for robust alpha traces back to early quant researchers in the 1970s who needed quick, objective ways to measure return predictions. The IC, long championed by practitioners at AQR and academic papers in The Journal of Portfolio Management, became a standard tool for testing if a factor’s rankings really predict forward returns.

Today, IC are core to industry workflows.

Professionals use rolling IC analysis and quantile performance checks to assess if their signals hold up out-of-sample. They run sector-neutral tests, analyze decay, and use robustness checks before trusting a signal in live trading.

Let's see how it works with Python.

Imports and setup

This block imports all the libraries needed for data handling, analysis, and visualization, and suppresses warnings to keep output readable.

1import numpy as np
2import pandas as pd
3import yfinance as yf
4import seaborn as sns
5import matplotlib.pyplot as plt
6from scipy.stats import spearmanr
7import warnings
8warnings.filterwarnings("ignore")

Here, we set up all our tools. Numpy and pandas handle data, yfinance brings in market data, seaborn and matplotlib handle plotting, and scipy.stats helps with statistics. We keep output uncluttered by hiding warnings so we can focus on results.

Download and prepare market data

This block pulls daily data for a list of large technology companies from Yahoo Finance, then restructures the data so it's easy to analyze by date and ticker symbol.

1symbols = ["AAPL", "MSFT", "GOOGL", "META", "NVDA", "AMZN", "TSLA"]
2prices = (
3    yf.download(
4        symbols,
5        start="2020-01-01",
6        end="2024-12-31",
7        auto_adjust=False,
8        progress=False,
9    )
10    .stack(level=1)
11    .reset_index()
12    .set_index("Date")
13)
14prices.columns = [col.lower() for col in prices.columns]
15
16grouped_prices = (
17    prices
18    .set_index("ticker", append=True)
19    .reorder_levels(["ticker", "Date"])
20    .sort_index(level=[0, 1])
21)

We're collecting daily price data for key symbols like AAPL, MSFT, and TSLA, covering nearly five years. The data is organized in a way that lets us easily group by ticker and date, with all column names set to lowercase for consistency. We make sure the structure lines up with how we'll analyze trends and returns by keeping both the company ticker and the trading date.

Calculate mean reversion and returns

This block calculates how far each stock's price is from its average over a rolling one-month window and measures returns over several holding periods for later analysis.

1window = 22
2mean_reversion = (
3    lambda x: (x - x.rolling(window, min_periods=window).mean())
4    / x.rolling(window, min_periods=window).std()
5)
6grouped_prices["factor_score"] = grouped_prices.groupby("ticker")["close"].transform(mean_reversion)
7
8lags = [1, 5, 10, 22, 42, 63, 126]
9for lag in lags:
10    grouped_prices[f"return_{lag}d"] = (
11        grouped_prices
12        .groupby(level="ticker")
13        .close
14        .pct_change(lag)
15    )
16
17for t in lags:
18    grouped_prices[f"target_{t}d"] = (
19        grouped_prices
20        .groupby(level="ticker")[f"return_{t}d"]
21        .shift(-t)
22    )
23
24grouped_prices.dropna(inplace=True)

We use a rolling 22-day window, roughly one month, to see how much a stock's price differs from its own recent average—this helps us understand how "unusual" a price move is. We also calculate returns for each company over periods from 1 to 126 days to explore short- and long-term movements.

By creating shifted versions of these returns, we set up "targets" representing forward returns. Last, we remove all rows with missing values to keep our analysis clean.

Visualize and evaluate relationships

This block visualizes the link between recent price deviations and future returns for Tesla, then measures how well our signal predicts returns for each symbol using a statistical test.

target = "target_22d"
metric = "factor_score"
j = sns.jointplot(x=metric, y=target, data=grouped_prices.loc["TSLA"])
plt.tight_layout()

The result is a plot that looks like this.

Now we can assess the Spearman Rank Correlation between the factor and foroward returns for each target period. This is the essence of alpha: is there a relationship between our factor and forward returns? In the case of a mean reversion factor, we expect there to be a negative one. The further the price extends past the mean, the more we expect it to revert back to the mean showing a negative relationship to returns.

1results = (
2    grouped_prices
3    .groupby("ticker")
4    .apply(lambda x: spearmanr(x[metric], x[target]))
5    .apply(pd.Series)
6    .round(4)
7)
8results.columns = ["statistic", "p_value"]
9results.sort_values("statistic", ascending=False)

We run a statistical test for each company to measure the strength of the connection between our price deviation measure and future returns. The results table ranks each symbol by how predictive that relationship is. This gives us a clear view of which stocks' price moves are most likely to revert and which signals might be meaningful.

We are looking for a negative t-statistic that is statistically significant.

Your next steps

You can now spot mean reversion signals and test their power across big tech stocks. Next, swap in different symbols or adjust the rolling window to see how results shift. Try focusing on shorter holding periods to test if the relationship changes for quick trades. This hands-on tweak will show you which setups hold up in fresh market conditions.