glennj's github pages

Logo

learnin'

View My GitHub Profile

Tools for processing CSV files

GNU awk

Not without significant edge cases, use the FPAT variable to specify, not the field separator, but the pattern of what a field contains:

BEGIN {
    FPAT = "([^,]+)|(\"[^\"]+\")"
}

{
    print "NF = ", NF
    for (i = 1; i <= NF; i++) {
        printf("$%d = <%s>\n", i, $i)
    }
}

or

awk -v FPAT='([^,]+)|("[^"]+")' '{print $NF}' file.csv

This cannot handle newlines in quoted fields, or doubled double-quotes in quoted fields.

Perl’s Text::CSV

Somewhat awkward to work with, but handles embedded newlines well.

csvkit

A suite of tools, implemented in Python. Install with pip.

miller

General purpose blender of data.

datamash

“GNU datamash is a command-line program which performs basic numeric, textual and statistical operations on input textual data files.”