Numerai releases a new version of the dataset, enhancing model performance and profitability
Author: Numerai
Compiled by: ChainCatcher
The Numerai dataset contains decades of historical data from global stock markets. Machine learning models trained on the dataset learn to predict stock returns and earn cryptocurrency (NMR) based on their performance in the Numerai tournament.
The performance of models on Numerai is driven by two things— the quality and quantity of information embedded in the dataset, and the skill and creativity of the model creators in transforming that information into predictions.
Today, we are releasing a new version of the Numerai dataset, which significantly increases the amount of embedded information with 3 times the features and 5 times the training data, and opens up a whole new dimension of research with 20 times the new targets.
We hope the new dataset will greatly enhance the performance and profitability of all models, and we can’t wait to start a new season of research with everyone in the community!
Whether you are a new user or an existing user, the easiest way to get started with the new dataset is to use our example script repository on GitHub.
github.com
In this repo, you will find examples on how to download the data, train new official example models, calculate validation metrics, generate predictions, and upload submissions back to Numerai.
There is also a new analysis and tips notebook that will guide you through the dataset and explain advanced concepts such as feature exposure and ensembling, time-wise time series cross-validation, and stacking.
You can download each file in the new dataset individually in CSV or Parquet file format. It is now available in our GraphQL API and Python client.
You can test your model's performance on the new validation data using our improved diagnostic tools—available 24/7, with a runtime of less than 60 seconds, and the upgraded UI provides full access to all historical runs.
First, the curators of the new dataset, Michael Phillips (MikeP) and Michael Oliver (MDO), share their research process in building the new example models and notebooks, and show you how the new data can improve model performance and profitability.