如何连接2个文件遵循某种模式?

时间:2015-09-28 14:42:51

标签: bash shell paste cat

我想要做的只是连接2个文件,如下例所示:

file 1        file 2
C1            O1             
C3            O3
..            O5
              O7
              O9
              O11
              O13
              O15
              O17
              O19
              ..

所需的输出文件是:

file 3
C1
O1
O9
O17
C3
O3
O11
O19
..
..

因此,模式是:首先是带有O1的C1,然后是文件2中的3行(所以,打印O9);然后在文件2中另外3行(所以,打印O17)。然后在文件2(O10)中打出C3和O3,3行,3行(O18);那么C5 ......等等。

我尝试用cat | paste - - - ...做一些事情,但它不起作用:(

有什么建议吗?

非常感谢提前

修改

我忘了告诉你他们是大文件。 :)

这是我的输入文件

cat file 1
C             18     -2.182951850        -0.000000000        -6.517815410
C             20     -4.127401075         0.000000000        -0.446529291
C             22     -3.314258919        -2.494999886       -15.624910016
C             24     -6.071850300         0.000000000         5.624757806
C             26     -2.023950100         0.000000000         5.624757806
C             28     -4.286402584        -0.000000000       -12.589102506
C             30     -6.230851809        -0.000000000        -6.517815410
C             32     -0.079500634         0.000000000        -0.446529291

cat file 2
O             34     -1.393125174        -0.640765928        -5.738276269
O             36     -3.337574640        -0.640765928         0.333010828
O             38     -2.524270589         1.854234106       -14.845370570
O             40     -5.282024106        -0.640765928         6.404297925
O             42     -2.182951850         1.281531856        -6.517815410
O             44     -4.127401075         1.281531856        -0.446529291
O             46     -3.314258919        -1.213468178       -15.624910016
O             48     -6.071850300         1.281531856         5.624757806
O             50     -2.972778044        -0.640765928        -7.297355528
O             52     -4.917227269        -0.640765928        -1.226068432
O             54     -4.104085113         1.854234106       -16.404449463
O             56     -6.861676614        -0.640765928         4.845217687
O             58     -2.813776294         0.640765779         4.845217687
O             60     -5.076228778         0.640765779       -13.368642136
O             62     -7.020678123         0.640765779        -7.297355528
O             64     -0.869326828         0.640765779        -1.226068432
O             66     -2.023950100        -1.281531708         5.624757806
O             68     -4.286402584        -1.281531708       -12.589102506
O             70     -6.230851809        -1.281531708        -6.517815410
O             72     -0.079500634        -1.281531708        -0.446529291
O             74     -1.234123906         0.640765779         6.404297925
O             76     -3.496576390         0.640765779       -11.809563365
O             78     -5.441025615         0.640765779        -5.738276269
O             80      0.710325077         0.640765779         0.333010828

C18必须紧跟O34,O42和O50。然后C20接着是O36,O44和O52,依此类推:

cat file 3
C             18     -2.182951850        -0.000000000        -6.517815410 
O             34     -1.393125174        -0.640765928        -5.738276269
O             42     -2.182951850         1.281531856        -6.517815410
O             50     -2.972778044        -0.640765928        -7.297355528
C             20     -4.127401075         0.000000000        -0.446529291
O             36     -3.337574640        -0.640765928         0.333010828
O             44     -4.127401075         1.281531856        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
..             ..      ............        .............       .........

Tom代码生成的输出是:

Tom output
C             18     -2.182951850        -0.000000000        -6.517815410
O             34     -1.393125174        -0.640765928        -5.738276269
O             42     -2.182951850         1.281531856        -6.517815410
O             50     -2.972778044        -0.640765928        -7.297355528
O             58     -2.813776294         0.640765779         4.845217687
O             66     -2.023950100        -1.281531708         5.624757806
O             74     -1.234123906         0.640765779         6.404297925
C             20     -4.127401075         0.000000000        -0.446529291
O             36     -3.337574640        -0.640765928         0.333010828
O             44     -4.127401075         1.281531856        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
O             60     -5.076228778         0.640765779       -13.368642136
O             68     -4.286402584        -1.281531708       -12.589102506
O             76     -3.496576390         0.640765779       -11.809563365
C             22     -3.314258919        -2.494999886       -15.624910016
O             38     -2.524270589         1.854234106       -14.845370570
O             46     -3.314258919        -1.213468178       -15.624910016
O             54     -4.104085113         1.854234106       -16.404449463
O             62     -7.020678123         0.640765779        -7.297355528
O             70     -6.230851809        -1.281531708        -6.517815410
O             78     -5.441025615         0.640765779        -5.738276269
and     so   on

有什么建议吗?

谢谢

2 个答案:

答案 0 :(得分:2)

我建议使用awk来执行此操作:

# first file
NR == FNR { 
    a[NR] = $0  # save each line into array
    ++len
    next        # skip further blocks
}

{ b[FNR] = $0 } # save each line from 2nd file into array

END {
    # loop through and print
    for (i = 1; i <= len; ++i) {
        print a[i]
        for (j = i; j <= FNR; j += 4) print b[j]
    }
}

脚本可以像awk -f script.awk file1 file2一样运行。

答案 1 :(得分:1)

您所描述的内容(通过评论中的确认)是一种模式

  • 由C行
  • 组成
  • 对一组9条O线进行采样,从与C线相同的偏移处开始。

要处理这个问题,我会使用带有9行“滑动窗口”的awk作为缓冲区。

而不是使用Tom的解决方案,将两个文件顺序指向awk并将其读入一个数组,我建议同时从两个文件中读取,这样就不会占用太多内存来保存数组。

这就是我的意思,作为一个单行:

awk '{a[NR]=$0;delete a[NR-10];} NR>9{getline Cline < "fileC";print Cline;print a[NR-9]; print a[NR-5]; print a[NR-1];}' fileO

为了便于阅读(和评论)而分解,这看起来像:

awk '
  {
    a[NR]=$0;        # Store our current "O" line in an array
    delete a[NR-10]; # Clean the array as we step through the file
  }

  NR>9 {
    getline Cline < "fileC";  # Get the next "C" line...
    print Cline;              # ... and print it
    print a[NR-9];            # \ 
    print a[NR-5];            #  > Print the three "O" lines for this 
    print a[NR-1];            # /
  }
' fileO

请注意,您有正确数量的“O”行,因为如果最后一组“O”行不完整,则不会打印。

我的示例数据的输出如下所示:

C             18     -2.182951850        -0.000000000        -6.517815410
O             34     -1.393125174        -0.640765928        -5.738276269
O             42     -2.182951850         1.281531856        -6.517815410
O             50     -2.972778044        -0.640765928        -7.297355528
C             20     -4.127401075         0.000000000        -0.446529291
O             36     -3.337574640        -0.640765928         0.333010828
O             44     -4.127401075         1.281531856        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
C             22     -3.314258919        -2.494999886       -15.624910016
O             38     -2.524270589         1.854234106       -14.845370570
O             46     -3.314258919        -1.213468178       -15.624910016
O             54     -4.104085113         1.854234106       -16.404449463
C             24     -6.071850300         0.000000000         5.624757806
O             40     -5.282024106        -0.640765928         6.404297925
O             48     -6.071850300         1.281531856         5.624757806
O             56     -6.861676614        -0.640765928         4.845217687
C             26     -2.023950100         0.000000000         5.624757806
O             42     -2.182951850         1.281531856        -6.517815410
O             50     -2.972778044        -0.640765928        -7.297355528
O             58     -2.813776294         0.640765779         4.845217687
C             28     -4.286402584        -0.000000000       -12.589102506
O             44     -4.127401075         1.281531856        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
O             60     -5.076228778         0.640765779       -13.368642136
C             30     -6.230851809        -0.000000000        -6.517815410
O             46     -3.314258919        -1.213468178       -15.624910016
O             54     -4.104085113         1.854234106       -16.404449463
O             62     -7.020678123         0.640765779        -7.297355528
C             32     -0.079500634         0.000000000        -0.446529291
O             48     -6.071850300         1.281531856         5.624757806
O             56     -6.861676614        -0.640765928         4.845217687
O             64     -0.869326828         0.640765779        -1.226068432
C             32     -0.079500634         0.000000000        -0.446529291
O             50     -2.972778044        -0.640765928        -7.297355528
O             58     -2.813776294         0.640765779         4.845217687
O             66     -2.023950100        -1.281531708         5.624757806
C             32     -0.079500634         0.000000000        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
O             60     -5.076228778         0.640765779       -13.368642136
O             68     -4.286402584        -1.281531708       -12.589102506
C             32     -0.079500634         0.000000000        -0.446529291
O             54     -4.104085113         1.854234106       -16.404449463
O             62     -7.020678123         0.640765779        -7.297355528
O             70     -6.230851809        -1.281531708        -6.517815410
C             32     -0.079500634         0.000000000        -0.446529291
O             56     -6.861676614        -0.640765928         4.845217687
O             64     -0.869326828         0.640765779        -1.226068432
O             72     -0.079500634        -1.281531708        -0.446529291
C             32     -0.079500634         0.000000000        -0.446529291
O             58     -2.813776294         0.640765779         4.845217687
O             66     -2.023950100        -1.281531708         5.624757806
O             74     -1.234123906         0.640765779         6.404297925
C             32     -0.079500634         0.000000000        -0.446529291
O             60     -5.076228778         0.640765779       -13.368642136
O             68     -4.286402584        -1.281531708       -12.589102506
O             76     -3.496576390         0.640765779       -11.809563365
C             32     -0.079500634         0.000000000        -0.446529291
O             62     -7.020678123         0.640765779        -7.297355528
O             70     -6.230851809        -1.281531708        -6.517815410
O             78     -5.441025615         0.640765779        -5.738276269

这是你的意思吗?