在多个文件中拆分CSV,读取第1列以命名输出文件

时间:2017-01-14 19:04:36

标签: linux bash csv awk

我有一个employees.csv文件,大约有500行和11列,列文件由双引号限制:

"1","Paula","Paula's Role","Paula's Job Description","Paula's Department","11/10/2008","8","14","10","24","0
"2","John","John's Role","John's Job Description","John's Department","11/10/2008","2","17","6","11","0"
"3","Mark","Mark's Role","Mark's Job Description","Mark's Department","11/10/2008","4","17","13","44","0"
:
:
(more records)
:
:
"499","Maria","Maria's Role","Maria's Job Description","Maria's Department","11/10/2008","8","15","2","9","0"
"500","Peter","Peter's Role","Peters's Job Description","Peters's Department","11/10/2008","8","17","16","22","0"

根据第一个字段(唯一员工ID号),我试图弄清楚如何在多个csv(一行=一个文件)中拆分此类文件。 该命令的输出应为500个单独的csv文件,每个文件包含1行,并命名如下:

1.csv
2.csv
3.csv
:
:
:
499.csv
500.csv

我一直尝试使用cat和awk的组合,但代码中有一些错误:

for i in $(cat unix | awk -F\, '{print $1}' /myfolder/employees.csv);

    do
        grep $i "/myfolder/employees.csv" > "/myfolder/splittedfiles/$i";
    done

非常感谢。

2 个答案:

答案 0 :(得分:1)

您可以像这样使用GNU awk:

awk 'BEGIN {FPAT="[^\"]+"} { print $0 > "/myfolder/splittedfiles/"$1".csv" }' yourfile 

FPAT通过正则表达式定义字段内容,这有助于我们从$1中删除引号。

答案 1 :(得分:0)

编辑(并且测试了这个),这个gawk脚本为我完成了这项工作:

gawk -F'"' -- '{print $0 >> ("/myfolder/splittedfiles/" $2 ".csv")}' /myfolder/employees.csv

-F'"'分割"处的字段,因此员工编号位于$2。然后("/myfolder/splittedfiles/" $2 ".csv")构建您想要的文件名,print $0 >> ...将原始行打印到该文件。

如果字段始终按照从1开始的数字顺序排列,则应该可以使用(未​​测试)

split -l 1 /myfolder/employees.csv /myfolder/splittedfiles/EMPL
empno=1
for fname in /myfolder/splittedfiles/EMPL* ; do
    mv "$f" "/myfolder/splittedfiles/${empno}.csv"
    empno=$((empno+1))
done

split使每一行(-l 1)成为一个单独的文件。 for按顺序循环遍历这些文件。从mv开始,${empno}.csv将每个文件重命名为empno=1。然后$((empno+1))增加empno