如何在csv文件中彼此相邻添加数据

时间:2013-10-24 13:54:00

标签: linux csv sed awk

如果我有3个csv文件,并且我想将数据合并为一个,但彼此并排,我该怎么做?例如:

初始合并文件:

,,,,,,,,,,,,

文件1:

20,09/05,5694
20,09/06,3234
20,09/08,2342

文件2:

20,09/05,2341
20,09/06,2334
20,09/09,342

文件3:

20,09/05,1231
20,09/08,3452
20,09/10,2345
20,09/11,372

最终合并文件:

09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,,,09/11,372

基本上每个文件的数据都会进入合并文件的特定列。 我知道awk函数可以用于此,但我不知道如何开始

编辑:仅打印每个文件的第2列和第3列。我用这个打印出第2和第3列:

awk -v f="${i}" -F, 'match ($0,f) { print $2","$3 }' file3.csv > d$i.csv

但是,比方说,例如,file1和file2在该行中为空,该行的数据将向左移动。所以我想出了这个来解释这个转变:

awk -v x="${i}" -F, 'match ($0,x) { if ($2='/NULL') { print "," }; else { print $2","$3}; }' alld.csv > d$i.csv

3 个答案:

答案 0 :(得分:3)

paste已完成此操作:

$ paste -d";" f1 f2 f3 | sed 's/;/,,,/g'
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,09/11,372

请注意,paste仅输出一个逗号:

$ paste -d, f1 f2 f3
09/05,5694,09/05,2341,09/05,1231
09/06,3234,09/06,2334,09/08,3452
09/08,2342,09/09,342,09/10,2345
,,09/11,372

为了拥有多个分隔符,我们可以使用另一个分隔符,例如;,然后用,,,替换为sed:

$ paste -d";" f1 f2 f3 | sed 's/;/,,,/g'
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,09/11,372

答案 1 :(得分:3)

使用GNU awk进行ARGIND:

$ gawk '{ a[FNR,ARGIND]=$0; maxFnr=(FNR>maxFnr?FNR:maxFnr) }
    END {
        for (i=1;i<=maxFnr;i++) {
            for (j=1;j<ARGC;j++)
                printf "%s%s", (j==1?"":",,,"), (a[i,j]?a[i,j]:",")
            print ""
        }
    }
' file1 file2 file3
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,,,09/11,372

如果您没有GNU awk,只需添加一个显示FNR==1{ARGIND++}的初始行。

每个请求的评论版本:

$ gawk '
    { a[FNR,ARGIND]=$0; # Store the current line in a 2-D array `a` indexed by
                        # the current line number `FNR` and file number `ARGIND`.

      maxFnr=(FNR>maxFnr?FNR:maxFnr)    # save the max FNR value
    }
    END{
        for (i=1;i<=maxFnr;i++) {  # Loop from 1 to max number of fields
                                   # seen across all files and for each:
            for (j=1;j<ARGC;j++)     # Loop from 1 to total number of files parsed and:
                printf "%s%s",         # Print 2 strings, specifically:
                   (j==1?"":",,,"),      # A field separator - empty if were printing
                                         # the first field, three commas otherwise.
                   (a[i,j]?a[i,j]:",")   # The value stored in the array if it was
                                         # present in the files, a comma otherwise.
            print ""                   # Print a newline
        }
    }
' file1 file2 file3

我最初使用数组fnr[FNR]来跟踪FNR的最大值,但恕我直言,这有点晦涩,它有一个缺陷,如果没有行,比如第二个字段,那么{{1}上的一个循环在for (i=1;i in fnr;i++)部分中,在进入第3场之前会拯救出来。

答案 2 :(得分:2)

使用pr

$ pr -mts',,,' file[1-3]
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,09/11,372