AWK合并文件(后续)

时间:2018-07-04 21:59:48

标签: awk

关注this post(根据“注释”中的要求进行更新)

我从 actual (伪装的)样本数据开始,然后是在该数据上运行脚本的结果

原始数据示例:

目录中前两个文件的标题和前两行。它们是相同的数据,这不是错误。 (它可能在此数据集中发生。)

文件1:

Provider,,,,,,,,,,,,,,
02/01/2018 - 02/28/2018,,,,,,,,,,,,,,
Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,Field12,Field13,Field14,Field15
B002H5QQJA,803814064988,803814064988,P2IIPDM5MDTW,P2IIPDM5MDTW,T,Prod_P,,,,1,foo1,bar1,YDtAK,BrandX
B002H5QQTU,803814064988,803814064988,K59C4XR93JOV,K59C4XR93JOV,T,Prod_P,,,,1,foo1,bar1,kmAnC,BrandX
B002H5QR44,803814064988,803814064988,FUBOROFTLW9U,FUBOROFTLW9U,T,Prod_P,,,,1,foo1,bar1,JdLye,BrandX
B002H5QRBC,803814064988,803814064988,KMHRXLF2FRKH,KMHRXLF2FRKH,T,Prod_P,,,,1,foo1,bar1,Biqvo,BrandX
B002H5QSC0,803814064988,803814064988,PCLB5UPGGP9T,PCLB5UPGGP9T,T,Prod_P,,,,1,foo2,bar2,Iwvhe,BrandX
B002H5QU3M,505545471538,505545471538,3K4GDYDEOH1M,3K4GDYDEOH1M,T,Prod_P,,,,1,foo3,bar3,NWsOC,BrandY
B002H5QUAK,417248985349,417248985349,7R40MN9AD9I8,7R40MN9AD9I8,T,Prod_I,1,0,1,0,foo4,bar4,YVQeH,BrandY
B002H5QUBY,417248985349,417248985349,C04GQONG1Z5B,C04GQONG1Z5B,T,Prod_I,1,0,1,0,foo4,bar4,PERMW,BrandY
B002H5QUCI,505545471538,505545471538,4E1ZGIJR1GPR,4E1ZGIJR1GPR,T,Prod_P,,,,1,foo3,bar3,UycEB,BrandY
B002H5QUVO,804699101426,804699101426,51RXKMWGJJ30,51RXKMWGJJ30,T,Prod_P,,,,1,foo5,bar5,Qwyuy,BrandY
B002H5QUZ0,804699101426,804699101426,7L0QBQM8S80L,7L0QBQM8S80L,T,Prod_P,,,,1,foo5,bar5,nqgId,BrandY
B002H5QXF2,803814064988,803814064988,PH0Q5QI34B0R,PH0Q5QI34B0R,T,Prod_P,,,,1,foo6,bar6,hPFiY,BrandX
B002H5QXWK,803814064988,803814064988,PSCLFNIDVZS0,PSCLFNIDVZS0,T,Prod_P,,,,1,foo6,bar6,BCdzF,BrandX

文件2:

Provider,,,,,,,,,,,,,,,
01/01/2018 - 01/31/2018,,,,,,,,,,,,,,,
Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,,Field12,Field13,Field14,Field15
B002H5N3AA,245462777033,245462777033,CFFWR2KSWLR8,CFFWR2KSWLR8,T,Prod_P,,,,1,bar1,foo2,bar1,RkG7D,BrandY
B002H5N3IM,245462777033,245462777033,CYFTO0FGAPSJ,CYFTO0FGAPSJ,T,Prod_P,,,,1,bar1,foo2,bar1,jqiGj,BrandY
B002H5N3R8,245462777033,245462777033,8ZNJHVCVO0A1,8ZNJHVCVO0A1,T,Prod_P,,,,1,bar1,foo2,bar1,Ylrcy,BrandY
B002H5N6X4,766193337142,766193337142,37YX24TRDPNW,37YX24TRDPNW,T,Prod_P,,,,1,bar2,foo3,bar2,WHxLZ,BrandX
B002H5N756,766193337142,766193337142,H56J19KCLFZP,H56J19KCLFZP,T,Prod_P,,,,1,bar2,foo3,bar2,VVw34,BrandX
B002H5N8QO,73612604823,73612604823,HZC9P776G2EP,HZC9P776G2EP,T,Prod_P,,,,1,bar3,foo4,bar3,X48HD,BrandZ
B002H5NA3U,932053704970,932053704970,XFIB2V8RQXN4,XFIB2V8RQXN4,T,Prod_P,,,,1,bar4,foo5,bar4,ghftn,BrandY
B002H5NJ6S,245675038659,245675038659,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_I,11,2,1,0,bar5,foo6,bar5,TVY19,BrandX
B002H5NJ6S,245675038659,245675038659,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_P,,,,2,bar5,foo6,bar5,M2j1i,BrandX
B002H5NJXQ,73612604823,73612604823,RJER36PXDF0T,RJER36PXDF0T,T,Prod_P,,,,1,bar6,foo7,bar6,1UnN3,BrandY
B002H5OU5C,491559514618,491559514618,X9K6BVZEHDDZ,X9K6BVZEHDDZ,T,Prod_P,,,,1,bar7,foo8,bar7,eybpO,BrandX
B002H5OU66,491559514618,491559514618,6510BKD3XD9R,6510BKD3XD9R,T,Prod_P,,,,1,bar7,foo8,bar7,yS9xk,BrandX
B002H5OU6Q,491559514618,491559514618,EFWDVP7FPCFA,EFWDVP7FPCFA,T,Prod_P,,,,1,bar7,foo8,bar7,0IXqS,BrandX

所需的输出:

Filename,Report_Period,Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,Field12,Field13,Field14,Field15
FILENAME,02/01/2018 - 02/28/2018,B002H5QQJA,8.03814E+11,8.04E+11,P2IIPDM5MDTW,P2IIPDM5MDTW,T,Prod_P,,,,1,foo1,bar1,YDtAK,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QQTU,8.03814E+11,8.04E+11,K59C4XR93JOV,K59C4XR93JOV,T,Prod_P,,,,1,foo1,bar1,kmAnC,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QR44,8.03814E+11,8.04E+11,FUBOROFTLW9U,FUBOROFTLW9U,T,Prod_P,,,,1,foo1,bar1,JdLye,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QRBC,8.03814E+11,8.04E+11,KMHRXLF2FRKH,KMHRXLF2FRKH,T,Prod_P,,,,1,foo1,bar1,Biqvo,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QSC0,8.03814E+11,8.04E+11,PCLB5UPGGP9T,PCLB5UPGGP9T,T,Prod_P,,,,1,foo2,bar2,Iwvhe,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QU3M,5.05545E+11,5.06E+11,3K4GDYDEOH1M,3K4GDYDEOH1M,T,Prod_P,,,,1,foo3,bar3,NWsOC,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QUAK,4.17249E+11,4.17E+11,7R40MN9AD9I8,7R40MN9AD9I8,T,Prod_I,1,0,1,0,foo4,bar4,YVQeH,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QUBY,4.17249E+11,4.17E+11,C04GQONG1Z5B,C04GQONG1Z5B,T,Prod_I,1,0,1,0,foo4,bar4,PERMW,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QUCI,5.05545E+11,5.06E+11,4E1ZGIJR1GPR,4E1ZGIJR1GPR,T,Prod_P,,,,1,foo3,bar3,UycEB,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QUVO,8.04699E+11,8.05E+11,51RXKMWGJJ30,51RXKMWGJJ30,T,Prod_P,,,,1,foo5,bar5,Qwyuy,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QUZ0,8.04699E+11,8.05E+11,7L0QBQM8S80L,7L0QBQM8S80L,T,Prod_P,,,,1,foo5,bar5,nqgId,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5QXF2,8.03814E+11,8.04E+11,PH0Q5QI34B0R,PH0Q5QI34B0R,T,Prod_P,,,,1,foo6,bar6,hPFiY,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5QXWK,8.03814E+11,8.04E+11,PSCLFNIDVZS0,PSCLFNIDVZS0,T,Prod_P,,,,1,foo6,bar6,BCdzF,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5N3AA,2.45463E+11,2.45E+11,CFFWR2KSWLR8,CFFWR2KSWLR8,T,Prod_P,,,,1,foo2,bar1,RkG7D,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5N3IM,2.45463E+11,2.45E+11,CYFTO0FGAPSJ,CYFTO0FGAPSJ,T,Prod_P,,,,1,foo2,bar1,jqiGj,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5N3R8,2.45463E+11,2.45E+11,8ZNJHVCVO0A1,8ZNJHVCVO0A1,T,Prod_P,,,,1,foo2,bar1,Ylrcy,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5N6X4,7.66193E+11,7.66E+11,37YX24TRDPNW,37YX24TRDPNW,T,Prod_P,,,,1,foo3,bar2,WHxLZ,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5N756,7.66193E+11,7.66E+11,H56J19KCLFZP,H56J19KCLFZP,T,Prod_P,,,,1,foo3,bar2,VVw34,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5N8QO,73612604823,73612604823,HZC9P776G2EP,HZC9P776G2EP,T,Prod_P,,,,1,foo4,bar3,X48HD,BrandZ
FILENAME,02/01/2018 - 02/28/2018,B002H5NA3U,9.32054E+11,9.32E+11,XFIB2V8RQXN4,XFIB2V8RQXN4,T,Prod_P,,,,1,foo5,bar4,ghftn,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5NJ6S,2.45675E+11,2.46E+11,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_I,11,2,1,0,foo6,bar5,TVY19,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5NJ6S,2.45675E+11,2.46E+11,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_P,,,,2,foo6,bar5,M2j1i,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5NJXQ,73612604823,73612604823,RJER36PXDF0T,RJER36PXDF0T,T,Prod_P,,,,1,foo7,bar6,1UnN3,BrandY
FILENAME,02/01/2018 - 02/28/2018,B002H5OU5C,4.9156E+11,4.92E+11,X9K6BVZEHDDZ,X9K6BVZEHDDZ,T,Prod_P,,,,1,foo8,bar7,eybpO,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5OU66,4.9156E+11,4.92E+11,6510BKD3XD9R,6510BKD3XD9R,T,Prod_P,,,,1,foo8,bar7,yS9xk,BrandX
FILENAME,02/01/2018 - 02/28/2018,B002H5OU6Q,4.9156E+11,4.92E+11,EFWDVP7FPCFA,EFWDVP7FPCFA,T,Prod_P,,,,1,foo8,bar7,0IXqS,BrandX

我的脚本(改编自我O.P.中接受的答案):

几乎可以使用。但是每个文件都包含1-3行:

gawk '
function basename(file) {
    sub(".*/", "", file)
    return file
  }
BEGIN { FS=OFS="," }
NR < 3 {
    if ( NR == 2 ) {
        hdr = "Report_Period" OFS
        val = val $1 OFS
    }
    next
}
FNR>3 {
    print "Filename", hdr $0
    next
}
{ print basename(FILENAME), val $0 }
' OFS="," /path/to/input/files/*.csv > ~/path/to/output/file/SampleOutput.csv

实际输出

这是结果文件的全部内容。问题似乎是标题在重复:

Sample1.csv,02/01/2018 - 02/28/2018,Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,Field12,Field13,Field14,Field15
Filename,Report_Period,B002H5QQJA,803814064988,803814064988,P2IIPDM5MDTW,P2IIPDM5MDTW,T,Prod_P,,,,1,foo1,bar1,YDtAK,BrandX
Filename,Report_Period,B002H5QQTU,803814064988,803814064988,K59C4XR93JOV,K59C4XR93JOV,T,Prod_P,,,,1,foo1,bar1,kmAnC,BrandX
Filename,Report_Period,B002H5QR44,803814064988,803814064988,FUBOROFTLW9U,FUBOROFTLW9U,T,Prod_P,,,,1,foo1,bar1,JdLye,BrandX
Filename,Report_Period,B002H5QRBC,803814064988,803814064988,KMHRXLF2FRKH,KMHRXLF2FRKH,T,Prod_P,,,,1,foo1,bar1,Biqvo,BrandX
Filename,Report_Period,B002H5QSC0,803814064988,803814064988,PCLB5UPGGP9T,PCLB5UPGGP9T,T,Prod_P,,,,1,foo2,bar2,Iwvhe,BrandX
Filename,Report_Period,B002H5QU3M,505545471538,505545471538,3K4GDYDEOH1M,3K4GDYDEOH1M,T,Prod_P,,,,1,foo3,bar3,NWsOC,BrandY
Filename,Report_Period,B002H5QUAK,417248985349,417248985349,7R40MN9AD9I8,7R40MN9AD9I8,T,Prod_I,1,0,1,0,foo4,bar4,YVQeH,BrandY
Filename,Report_Period,B002H5QUBY,417248985349,417248985349,C04GQONG1Z5B,C04GQONG1Z5B,T,Prod_I,1,0,1,0,foo4,bar4,PERMW,BrandY
Filename,Report_Period,B002H5QUCI,505545471538,505545471538,4E1ZGIJR1GPR,4E1ZGIJR1GPR,T,Prod_P,,,,1,foo3,bar3,UycEB,BrandY
Filename,Report_Period,B002H5QUVO,804699101426,804699101426,51RXKMWGJJ30,51RXKMWGJJ30,T,Prod_P,,,,1,foo5,bar5,Qwyuy,BrandY
Filename,Report_Period,B002H5QUZ0,804699101426,804699101426,7L0QBQM8S80L,7L0QBQM8S80L,T,Prod_P,,,,1,foo5,bar5,nqgId,BrandY
Filename,Report_Period,B002H5QXF2,803814064988,803814064988,PH0Q5QI34B0R,PH0Q5QI34B0R,T,Prod_P,,,,1,foo6,bar6,hPFiY,BrandX
Filename,Report_Period,B002H5QXWK,803814064988,803814064988,PSCLFNIDVZS0,PSCLFNIDVZS0,T,Prod_P,,,,1,foo6,bar6,BCdzF,BrandX
Sample2.csv,02/01/2018 - 02/28/2018,Provider,,,,,,,,,,,,,,
Sample2.csv,02/01/2018 - 02/28/2018,01/01/2018 - 01/31/2018,,,,,,,,,,,,,,
Sample2.csv,02/01/2018 - 02/28/2018,Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,Field12,Field13,Field14,Field15
Filename,Report_Period,B002H5N3AA,2.45463E+11,2.45463E+11,CFFWR2KSWLR8,CFFWR2KSWLR8,T,Prod_P,,,,1,foo2,bar1,RkG7D,BrandY
Filename,Report_Period,B002H5N3IM,2.45463E+11,2.45463E+11,CYFTO0FGAPSJ,CYFTO0FGAPSJ,T,Prod_P,,,,1,foo2,bar1,jqiGj,BrandY
Filename,Report_Period,B002H5N3R8,2.45463E+11,2.45463E+11,8ZNJHVCVO0A1,8ZNJHVCVO0A1,T,Prod_P,,,,1,foo2,bar1,Ylrcy,BrandY
Filename,Report_Period,B002H5N6X4,7.66193E+11,7.66193E+11,37YX24TRDPNW,37YX24TRDPNW,T,Prod_P,,,,1,foo3,bar2,WHxLZ,BrandX
Filename,Report_Period,B002H5N756,7.66193E+11,7.66193E+11,H56J19KCLFZP,H56J19KCLFZP,T,Prod_P,,,,1,foo3,bar2,VVw34,BrandX
Filename,Report_Period,B002H5N8QO,73612604823,73612604823,HZC9P776G2EP,HZC9P776G2EP,T,Prod_P,,,,1,foo4,bar3,X48HD,BrandZ
Filename,Report_Period,B002H5NA3U,9.32054E+11,9.32054E+11,XFIB2V8RQXN4,XFIB2V8RQXN4,T,Prod_P,,,,1,foo5,bar4,ghftn,BrandY
Filename,Report_Period,B002H5NJ6S,2.45675E+11,2.45675E+11,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_I,11,2,1,0,foo6,bar5,TVY19,BrandX
Filename,Report_Period,B002H5NJ6S,2.45675E+11,2.45675E+11,MUCSMOR5HB7V,MUCSMOR5HB7V,T,Prod_P,,,,2,foo6,bar5,M2j1i,BrandX
Filename,Report_Period,B002H5NJXQ,73612604823,73612604823,RJER36PXDF0T,RJER36PXDF0T,T,Prod_P,,,,1,foo7,bar6,1UnN3,BrandY
Filename,Report_Period,B002H5OU5C,4.9156E+11,4.9156E+11,X9K6BVZEHDDZ,X9K6BVZEHDDZ,T,Prod_P,,,,1,foo8,bar7,eybpO,BrandX
Filename,Report_Period,B002H5OU66,4.9156E+11,4.9156E+11,6510BKD3XD9R,6510BKD3XD9R,T,Prod_P,,,,1,foo8,bar7,yS9xk,BrandX
Filename,Report_Period,B002H5OU6Q,4.9156E+11,4.9156E+11,EFWDVP7FPCFA,EFWDVP7FPCFA,T,Prod_P,,,,1,foo8,bar7,0IXqS,BrandX

再次感谢!

2 个答案:

答案 0 :(得分:2)

这可能是您要尝试做的事情:

$ cat tst.awk
BEGIN { FS=OFS="," }
NR==3 {
    print "Filename", "Report_Period", $0
}
FNR==2 {
    fname = FILENAME
    sub(".*/","",fname)
    period = $1
}
FNR>3 {
    print fname, period, $0
}

我制作了自己的数据进行测试(文件#_row#_col#):

$ cat file1
Provider,,
02/01/2018 - 02/28/2018,,
Field1,Field2,Field3
f1_r1_c1,f1_r1_c2,f1_r1_c3
f1_r2_c1,f1_r2_c2,f1_r2_c3

$ cat file2
Provider,,
01/01/2018 - 01/31/2018,,
Field1,Field2,Field3
f2_r1_c1,f2_r1_c2,f2_r1_c3
f2_r2_c1,f2_r2_c2,f2_r2_c3

$ awk -f tst.awk file1 file2
Filename,Report_Period,Field1,Field2,Field3
file1,02/01/2018 - 02/28/2018,f1_r1_c1,f1_r1_c2,f1_r1_c3
file1,02/01/2018 - 02/28/2018,f1_r2_c1,f1_r2_c2,f1_r2_c3
file2,01/01/2018 - 01/31/2018,f2_r1_c1,f2_r1_c2,f2_r1_c3
file2,01/01/2018 - 01/31/2018,f2_r2_c1,f2_r2_c2,f2_r2_c3

如果这不是您想要的,请编辑您的问题以阐明您的要求,并使用最少的示例输入/输出。

答案 1 :(得分:1)

尝试两个。忘记包含标题。

(pi 580) $ cat /tmp/x.sh
#!/bin/sh

gawk '
  BEGIN {FS=OFS=","}
  FNR == 1 {file=FILENAME; sub(".*/", "", file); next}
  FNR == 2 {period=$1; next}
  NR  == 3 {print "file","period",$0; next}
  FNR == 3 {next}
  {print file,period,$0}
' $*

(pi 581) $ /tmp/x.sh /tmp/f?.*
file,period,Field1,Field2,Field3,Field4,Field5,Field6,Field7,Field8,Field9,Field10,Field11,Field12,Field13,Field14,Field15
f1.txt,02/01/2018 - 02/28/2018,B002H5QQJA,803814064988,803814064988,P2IIPDM5MDTW,P2IIPDM5MDTW,T,Prod_P,,,,1,foo1,bar1,YDtAK,BrandX
f1.txt,02/01/2018 - 02/28/2018,B002H5QQTU,803814064988,803814064988,K59C4XR93JOV,K59C4XR93JOV,T,Prod_P,,,,1,foo1,bar1,kmAnC,BrandX
f2.txt,01/01/2018 - 01/31/2018,B002H5N3AA,245462777033,245462777033,CFFWR2KSWLR8,CFFWR2KSWLR8,T,Prod_P,,,,1,bar1,foo2,bar1,RkG7D,BrandY
f2.txt,01/01/2018 - 01/31/2018,B002H5N3IM,245462777033,245462777033,CYFTO0FGAPSJ,CYFTO0FGAPSJ,T,Prod_P,,,,1,bar1,foo2,bar1,jqiGj,BrandY