Small tools: Using csvkit to quickly inspect CSV data

csvkit¹ is a collection of shell scripts which does simple manipulations and statistic on CSV files and which can also import data into SQL-databases ad-hoc.

Here I show how one can easily do simple statistics, and also how one can use sparklines² for quick plotting in the terminal. For illustration purposes, I use the European Unions’s COVID-19 weekly statistics³.

cskit can do much more, go and explore.

$ wget https://opendata.ecdc.europa.eu/covid19/nationalcasedeath_eueea_daily_ei/csv/data.csv
$ csvsql --db sqlite:///covid19.db --insert data.csv
$ csvcut -n data.csv
dateRep
day
month
year
cases
deaths
countriesAndTerritories
geoId
countryterritoryCode
popData2020
continentExp
$ echo 'select year,month,sum(cases) from data group by month order by year,month;' | sqlite3 covid19.db
0|7.0|1920799.0
0|8.0|2475308.0
0|9.0|2532891.0
0|10.0|6016018.0
0|11.0|11312325.0
0|12.0|14230992.0
0|1.0|36056524.0
0|2.0|23847825.0
0|3.0|23225316.0
0|4.0|15142428.0
0|5.0|6717141.0
0|6.0|2574159.0
$ echo 'select sum(cases) from data group by month order by year,month;' | sqlite3 covid19.db | sparklines -n3
      █
     ▁██▇▂
▁▁▁▄▇█████▄▁_

Note: Sun 19 Jun 2022 01:47:22 PM CEST Sparklines look much better in the terminal; my codeblock stylesheet doesn’t render these good enough.

“csvkit: A suite of utilities for converting to and working with CSV, the king of tabular file formats.”; The csvkit team; URL: https://github.com/wireservice/csvkit ↩
“sparklines”; Gherman, Dinu; URL: https://pypi.org/project/sparklines/ ↩
“Data on the daily number of new reported COVID-19 cases and deaths by EU/EEA country”; European Centre for Disease Prevention and Control; URL: https://www.ecdc.europa.eu/en/publications-data/data-daily-new-cases-covid-19-eueea-country ↩