Skip to main content
Skip to main content
Edit this page

Crowdsourced air traffic data from The OpenSky Network 2020

The data in this dataset is derived and cleaned from the full OpenSky dataset to illustrate the development of air traffic during the COVID-19 pandemic. It spans all flights seen by the network's more than 2500 members since 1 January 2019. More data will be periodically included in the dataset until the end of the COVID-19 pandemic.

Source: https://zenodo.org/records/5092942

Martin Strohmeier, Xavier Olive, Jannis Luebbe, Matthias Schaefer, and Vincent Lenders "Crowdsourced air traffic data from the OpenSky Network 2019–2020" Earth System Science Data 13(2), 2021 https://doi.org/10.5194/essd-13-357-2021

Download the Dataset

Run the command:

Download will take about 2 minutes with good internet connection. There are 30 files with total size of 4.3 GB.

Create the Table

Import Data

Upload data into ClickHouse in parallel:

  • Here we pass the list of files (ls -1 flightlist_*.csv.gz) to xargs for parallel processing. xargs -P100 specifies to use up to 100 parallel workers but as we only have 30 files, the number of workers will be only 30.
  • For every file, xargs will run a script with bash -c. The script has substitution in form of {} and the xargs command will substitute the filename to it (we have asked it for xargs with -I{}).
  • The script will decompress the file (gzip -c -d "{}") to standard output (-c parameter) and the output is redirected to clickhouse-client.
  • We also asked to parse DateTime fields with extended parser (--date_time_input_format best_effort) to recognize ISO-8601 format with timezone offsets.

Finally, clickhouse-client will do insertion. It will read input data in CSVWithNames format.

Parallel upload takes 24 seconds.

If you don't like parallel upload, here is sequential variant:

Validate the Data

Query:

Result:

The size of dataset in ClickHouse is just 2.66 GiB, check it.

Query:

Result:

Run Some Queries

Total distance travelled is 68 billion kilometers.

Query:

Result:

Average flight distance is around 1000 km.

Query:

Result:

Most busy origin airports and the average distance seen

Query:

Result:

Number of flights from three major Moscow airports, weekly

Query:

Result:

Online Playground

You can test other queries to this data set using the interactive resource Online Playground. For example, like this. However, please note that you cannot create temporary tables here.