Selective Multiple Testing: Inference for Large Panels with Many Covariates(Paper)(Code)(Slides) Co-author: Markus Pelger
R&R at Management Science.
We propose Panel Multiple Testing that allows us to select covariates that explain a large cross-section with false discovery control. In our empirical asset pricing study, we select sparse risk factors from a factor zoo of 114, to explain 243 doubly-sorted portfolio excess returns.
Stanford GSB, CityU Hong Kong; NASMES, AMES, INFORMS, Western Conference on Mathematical Finance, NBER-NSF SBIES, California Econometrics Conference, Stanford HAI Financial Services Industry Review
Graph Machine Learning for Asset Pricing: Traversing the Supply Chain(Paper)(Code) Co-authors: Agostino Capponi, Jose Sidaoui
Major Revision at Journal of Financial Economics (JFE).
We develop a nonparametric method to aggregate firm characteristics across a large supply chain network to explain cross-sectional expected returns. Each firm receives a pricing signal, nonlinearly constructed from the characteristics of neighboring firms within d-hops on the network. We find that $d = 3$ – encompassing network effects up to the third order – balances bias reduction from higher-order relations against variance from added complexity. Our model leads to a portfolio sorted by ML-driven firm-level estimated returns that condition on both historical supply chain data and firm characteristics. We achieve over a 16% out-of-sample Sharpe gain vs direct-link models, and outperform the Fama–French five-factor and PCA benchmarks. We find that the ML-managed portfolio improves mean-variance efficiency, measured by Sharpe ratio. Lastly, we show that the conditional mean return estimation of more central firms is 55% more sensitive to missingness of supply chain links compared to that of peripheral firms in the supply chain graph.
UT Austin, UNC, Baruch; SoFiE, NFA, EFA, Inaugural Finance Research Revolution Conference at Vitznau, Switzerland, INFORMS, Luohan Academy
The Nonstationarity-Complexity Tradeoff in Return Prediction(Paper) Co-authors: Agostino Capponi, Chengpiao Huang, Jose Sidaoui, Kaizheng Wang
We investigate machine learning models for stock return prediction in non-stationary environments, revealing a fundamental nonstationarity-complexity tradeoff: complex models reduce misspecification error but require longer training windows that introduce stronger non-stationarity. We resolve this tension with a novel model selection method that jointly optimizes model class and training window size using a tournament procedure that adaptively evaluates candidates on non-stationary validation data. Our theoretical analysis demonstrates that this approach balances misspecification error, estimation variance, and non-stationarity, performing close to the best model in hindsight. Applying our method to 17 industry portfolio returns, we consistently outperform standard rolling-window benchmarks, improving out-of-sample $R^2$ by 14–23% on average. During NBER-designated recessions, improvements are substantial: our method achieves positive $R^2$ during the Gulf War recession while benchmarks are negative, and improves $R^2$ in absolute terms by at least 80bps during the 2001 recession as well as superior performance during the 2008 Financial Crisis. Economically, a trading strategy based on our selected model generates 31% higher cumulative returns averaged across the industries.