Datasets
Key Concepts
Listing Process
Datasets contributed to LEAN can be quickly listed in the QuantConnect Dataset Marketplace, and distributed for sale to more than 250,000 users in the QuantConnect community. To list a dataset, reach out to the QuantConnect Team for a quick review, then proceed with the data creation and process steps in the following pages.
Datasets must be well defined, with realistic timestamps for when the data was available ("point in time"). Ideally datasets need at least a 2 year track record and to be maintained by a reputable company. They should be accompanied with full documentation and code examples so the community can harness the data.
Data Sources
The GetSource
get_source
method of your dataset class instructs LEAN where to find the data. This method must return a SubscriptionDataSource object, which contains the data location and format. We host your data, so the transportMedium
transport_medium
must be SubscriptionTransportMedium.LocalFile
and the format
must be FileFormat.Csv
.
TimeZones
The DataTimeZone
method of your data source class declares the time zone of your dataset. This method returns a NodaTime.DateTimeZone object. If your dataset provides trading data and universe data, the DataTimeZone
methods in your Lean.DataSource.<vendorNameDatasetName> / <vendorNameDatasetName>.cs and Lean.DataSource.<vendorNameDatasetName> / <vendorNameDatasetName>Universe.cs files must be the same.
Linked Datasets
Your dataset is linked if any of the following statements are true:
- Your dataset describes market price properties of specific securities (for example, the closing price of AAPL).
- Your alternative dataset is linked to individual securities (for example, the Wikipedia page view count of AAPL).
Examples of unlinked datasets would be the weather of New York City, where data is not relevant to a specific security.
When a dataset is linked, it needs to be mapped to underlying assets through time.
The RequiresMapping
boolean instructs LEAN to handle the security and ticker mapping issues.