Get Unique Values of a CSV Column Using Bash Command
Example input CSV file. The file is named transaction.csv
invoice_number,item_id,qty
#001,1,2
#001,2,2
#002,2,2
#003,1,2
#003,5,3
#004,2,2
#003,5,3
#005,2,2
The goal is to get the list of unique invoice_number
#001
#002
#003
#004
#005
Step
- Run awk command split by comma (-F), then print the first column.
awk -F ',' '{print $1}' transaction.csv
- Add sorting with unique params sort -u. Use -r to get reverse order
- Final command is
awk -F ',' '{print $1}' transaction.csv | sort -u
Reference
- More detail about awk command http://tldp.org/LDP/abs/html/awk.html
- More detail about
sort -u
vssort | uniq
performance discussion at stackoverflow thread