如果我有3个csv文件,并且我想将数据合并为一个,但彼此并排,我该怎么做?例如:
初始合并文件:
,,,,,,,,,,,,
文件1:
20,09/05,5694
20,09/06,3234
20,09/08,2342
文件2:
20,09/05,2341
20,09/06,2334
20,09/09,342
文件3:
20,09/05,1231
20,09/08,3452
20,09/10,2345
20,09/11,372
最终合并文件:
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,,,09/11,372
基本上每个文件的数据都会进入合并文件的特定列。 我知道awk函数可以用于此,但我不知道如何开始
编辑:仅打印每个文件的第2列和第3列。我用这个打印出第2和第3列:
awk -v f="${i}" -F, 'match ($0,f) { print $2","$3 }' file3.csv > d$i.csv
但是,比方说,例如,file1和file2在该行中为空,该行的数据将向左移动。所以我想出了这个来解释这个转变:
awk -v x="${i}" -F, 'match ($0,x) { if ($2='/NULL') { print "," }; else { print $2","$3}; }' alld.csv > d$i.csv
答案 0 :(得分:3)
paste
已完成此操作:
$ paste -d";" f1 f2 f3 | sed 's/;/,,,/g'
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,09/11,372
请注意,paste
仅输出一个逗号:
$ paste -d, f1 f2 f3
09/05,5694,09/05,2341,09/05,1231
09/06,3234,09/06,2334,09/08,3452
09/08,2342,09/09,342,09/10,2345
,,09/11,372
为了拥有多个分隔符,我们可以使用另一个分隔符,例如;
,然后用,,,
替换为sed:
$ paste -d";" f1 f2 f3 | sed 's/;/,,,/g'
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,09/11,372
答案 1 :(得分:3)
使用GNU awk进行ARGIND:
$ gawk '{ a[FNR,ARGIND]=$0; maxFnr=(FNR>maxFnr?FNR:maxFnr) }
END {
for (i=1;i<=maxFnr;i++) {
for (j=1;j<ARGC;j++)
printf "%s%s", (j==1?"":",,,"), (a[i,j]?a[i,j]:",")
print ""
}
}
' file1 file2 file3
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,,,09/11,372
如果您没有GNU awk,只需添加一个显示FNR==1{ARGIND++}
的初始行。
每个请求的评论版本:
$ gawk '
{ a[FNR,ARGIND]=$0; # Store the current line in a 2-D array `a` indexed by
# the current line number `FNR` and file number `ARGIND`.
maxFnr=(FNR>maxFnr?FNR:maxFnr) # save the max FNR value
}
END{
for (i=1;i<=maxFnr;i++) { # Loop from 1 to max number of fields
# seen across all files and for each:
for (j=1;j<ARGC;j++) # Loop from 1 to total number of files parsed and:
printf "%s%s", # Print 2 strings, specifically:
(j==1?"":",,,"), # A field separator - empty if were printing
# the first field, three commas otherwise.
(a[i,j]?a[i,j]:",") # The value stored in the array if it was
# present in the files, a comma otherwise.
print "" # Print a newline
}
}
' file1 file2 file3
我最初使用数组fnr[FNR]
来跟踪FNR的最大值,但恕我直言,这有点晦涩,它有一个缺陷,如果没有行,比如第二个字段,那么{{1}上的一个循环在for (i=1;i in fnr;i++)
部分中,在进入第3场之前会拯救出来。
答案 2 :(得分:2)
使用pr
:
$ pr -mts',,,' file[1-3]
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,09/11,372