我有一个employees.csv文件,大约有500行和11列,列文件由双引号限制:
"1","Paula","Paula's Role","Paula's Job Description","Paula's Department","11/10/2008","8","14","10","24","0
"2","John","John's Role","John's Job Description","John's Department","11/10/2008","2","17","6","11","0"
"3","Mark","Mark's Role","Mark's Job Description","Mark's Department","11/10/2008","4","17","13","44","0"
:
:
(more records)
:
:
"499","Maria","Maria's Role","Maria's Job Description","Maria's Department","11/10/2008","8","15","2","9","0"
"500","Peter","Peter's Role","Peters's Job Description","Peters's Department","11/10/2008","8","17","16","22","0"
根据第一个字段(唯一员工ID号),我试图弄清楚如何在多个csv(一行=一个文件)中拆分此类文件。 该命令的输出应为500个单独的csv文件,每个文件包含1行,并命名如下:
1.csv
2.csv
3.csv
:
:
:
499.csv
500.csv
我一直尝试使用cat和awk的组合,但代码中有一些错误:
for i in $(cat unix | awk -F\, '{print $1}' /myfolder/employees.csv);
do
grep $i "/myfolder/employees.csv" > "/myfolder/splittedfiles/$i";
done
非常感谢。
答案 0 :(得分:1)
您可以像这样使用GNU awk:
awk 'BEGIN {FPAT="[^\"]+"} { print $0 > "/myfolder/splittedfiles/"$1".csv" }' yourfile
FPAT
通过正则表达式定义字段内容,这有助于我们从$1
中删除引号。
答案 1 :(得分:0)
编辑(并且测试了这个),这个gawk
脚本为我完成了这项工作:
gawk -F'"' -- '{print $0 >> ("/myfolder/splittedfiles/" $2 ".csv")}' /myfolder/employees.csv
-F'"'
分割"
处的字段,因此员工编号位于$2
。然后("/myfolder/splittedfiles/" $2 ".csv")
构建您想要的文件名,print $0 >> ...
将原始行打印到该文件。
或如果字段始终按照从1开始的数字顺序排列,则应该可以使用(未测试)
split -l 1 /myfolder/employees.csv /myfolder/splittedfiles/EMPL
empno=1
for fname in /myfolder/splittedfiles/EMPL* ; do
mv "$f" "/myfolder/splittedfiles/${empno}.csv"
empno=$((empno+1))
done
split
使每一行(-l 1
)成为一个单独的文件。 for
按顺序循环遍历这些文件。从mv
开始,${empno}.csv
将每个文件重命名为empno=1
。然后$((empno+1))
增加empno
。