glennj's github pages

learnin'


Project maintained by glennj Hosted on GitHub Pages — Theme by mattgraham

Tools for processing CSV files

GNU awk

Not without significant edge cases, use the FPAT variable to specify, not the field separator, but the pattern of what a field contains:

BEGIN {
    FPAT = "([^,]+)|(\"[^\"]+\")"
}

{
    print "NF = ", NF
    for (i = 1; i <= NF; i++) {
        printf("$%d = <%s>\n", i, $i)
    }
}

or

awk -v FPAT='([^,]+)|("[^"]+")' '{print $NF}' file.csv

This cannot handle newlines in quoted fields, or doubled double-quotes in quoted fields.

Perl’s Text::CSV

Somewhat awkward to work with, but handles embedded newlines well.

csvkit

A suite of tools, implemented in Python. Install with pip.

miller

General purpose blender of data.

datamash

“GNU datamash is a command-line program which performs basic numeric, textual and statistical operations on input textual data files.”