Small tools: Using csvkit to quickly inspect CSV data
csvkit
1 is a collection of shell scripts which does simple manipulations
and statistic on CSV files and which can also import data into SQL-databases
ad-hoc.
Here I show how one can easily do simple statistics, and also how one can use sparklines2 for quick plotting in the terminal. For illustration purposes, I use the European Unions’s COVID-19 weekly statistics3.
cskit
can do much more, go and explore.
$ wget https://opendata.ecdc.europa.eu/covid19/nationalcasedeath_eueea_daily_ei/csv/data.csv
$ csvsql --db sqlite:///covid19.db --insert data.csv
$ csvcut -n data.csv
1: dateRep
2: day
3: month
4: year
5: cases
6: deaths
7: countriesAndTerritories
8: geoId
9: countryterritoryCode
10: popData2020
11: continentExp
$ echo 'select year,month,sum(cases) from data group by month order by year,month;' | sqlite3 covid19.db
2021.0|7.0|1920799.0
2021.0|8.0|2475308.0
2021.0|9.0|2532891.0
2021.0|10.0|6016018.0
2021.0|11.0|11312325.0
2021.0|12.0|14230992.0
2022.0|1.0|36056524.0
2022.0|2.0|23847825.0
2022.0|3.0|23225316.0
2022.0|4.0|15142428.0
2022.0|5.0|6717141.0
2022.0|6.0|2574159.0
$ echo 'select sum(cases) from data group by month order by year,month;' | sqlite3 covid19.db | sparklines -n3
█
▁██▇▂
▁▁▁▄▇█████▄▁_
Note: Sun 19 Jun 2022 01:47:22 PM CEST Sparklines look much better in the terminal; my codeblock stylesheet doesn’t render these good enough.
-
“csvkit: A suite of utilities for converting to and working with CSV, the king of tabular file formats.”; The csvkit team; URL: https://github.com/wireservice/csvkit ↩
-
“sparklines”; Gherman, Dinu; URL: https://pypi.org/project/sparklines/ ↩
-
“Data on the daily number of new reported COVID-19 cases and deaths by EU/EEA country”; European Centre for Disease Prevention and Control; URL: https://www.ecdc.europa.eu/en/publications-data/data-daily-new-cases-covid-19-eueea-country ↩