我如何使用特定模式对这两个文件进行排序?

时间:2015-11-25 16:30:51

标签: bash loops awk

我有两个文件,如:

cat file 1
C             18     -2.182951850        -0.000000000        -6.517815410
C             20     -4.127401075         0.000000000        -0.446529291
C             22     -3.314258919        -2.494999886       -15.624910016
C             24     -6.071850300         0.000000000         5.624757806
C             26     -2.023950100         0.000000000         5.624757806
C             28     -4.286402584        -0.000000000       -12.589102506
C             30     -6.230851809        -0.000000000        -6.517815410
C             32     -0.079500634         0.000000000        -0.446529291
..            ..     ............         ...........        ............
cat file 2
O             34     -1.393125174        -0.640765928        -5.738276269
O             36     -3.337574640        -0.640765928         0.333010828
O             38     -2.524270589         1.854234106       -14.845370570
O             40     -5.282024106        -0.640765928         6.404297925
O             42     -2.182951850         1.281531856        -6.517815410
O             44     -4.127401075         1.281531856        -0.446529291
O             46     -3.314258919        -1.213468178       -15.624910016
O             48     -6.071850300         1.281531856         5.624757806
O             50     -2.972778044        -0.640765928        -7.297355528
O             52     -4.917227269        -0.640765928        -1.226068432
O             54     -4.104085113         1.854234106       -16.404449463
O             56     -6.861676614        -0.640765928         4.845217687
O             58     -2.813776294         0.640765779         4.845217687
O             60     -5.076228778         0.640765779       -13.368642136
O             62     -7.020678123         0.640765779        -7.297355528
O             64     -0.869326828         0.640765779        -1.226068432
O             66     -2.023950100        -1.281531708         5.624757806
O             68     -4.286402584        -1.281531708       -12.589102506
O             70     -6.230851809        -1.281531708        -6.517815410
O             72     -0.079500634        -1.281531708        -0.446529291
O             74     -1.234123906         0.640765779         6.404297925
O             76     -3.496576390         0.640765779       -11.809563365
O             78     -5.441025615         0.640765779        -5.738276269
O             80      0.710325077         0.640765779         0.333010828
...           ...     ...........         ...........         ...........

我想使用以下模式连接这两个文件:C18行跟随O34行,然后文件2中的3行:O42行,然后文件2中的另外3行:O50行。

接下来,C20,O36,O44和O52。您可能已经注意到,在4次循环后,模式重复O行,所以我想跳到C26中的O58行并跟随行。文件非常大,所以我只需要在这4次迭代后跳转的东西。为清楚起见,我将向您显示输出文件:

cat file 3
C             18     -2.182951850        -0.000000000        -6.517815410 
O             34     -1.393125174        -0.640765928        -5.738276269
O             42     -2.182951850         1.281531856        -6.517815410
O             50     -2.972778044        -0.640765928        -7.297355528
C             20     -4.127401075         0.000000000        -0.446529291
O             36     -3.337574640        -0.640765928         0.333010828
O             44     -4.127401075         1.281531856        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
C             22     -3.314258919        -2.494999886       -15.624910016
O             38     -2.524270589         1.854234106       -14.845370570
O             46     -3.314258919        -1.213468178       -15.624910016
O             54     -4.104085113         1.854234106       -16.404449463
C             24     -6.071850300         0.000000000         5.624757806
O             40     -5.282024106        -0.640765928         6.404297925
O             48     -6.071850300         1.281531856         5.624757806
O             56     -6.861676614        -0.640765928         4.845217687
here comes the problem!!
C             26     -2.023950100         0.000000000         5.624757806
O             58     -2.813776294         0.640765779         4.845217687
O             66     -2.023950100        -1.281531708         5.624757806
O             74     -1.234123906         0.640765779         6.404297925
C             28     -4.286402584        -0.000000000       -12.589102506
O             60     -5.076228778         0.640765779       -13.368642136
O             68     -4.286402584        -1.281531708       -12.589102506
O             76     -3.496576390         0.640765779       -11.809563365
..             ..      ............        .............       .........

这是我想要重复O行时使用的代码:

# first file
NR == FNR { 
    a[NR] = $0  # save each line into array
    ++len
    next        # skip further blocks
}

{ b[FNR] = $0 } # save each line from 2nd file into array

END {
    # loop through and print
    for (i = 1; i <= len; ++i) {
        print a[i]
        for (j = i; j <= FNR; j += 4) print b[j]
    }
}

并执行awk -f script.awk file1 file2

非常感谢提前。 GIT中

1 个答案:

答案 0 :(得分:1)

如果文件很大,在内存操作中可能不可行。在这里,我建议采用两阶段方法。以相同的顺序对文件2行进行随机播放,以所需的格式显示,并按行1:3的比例合并file1和file2。

例如:

$ awk   '{a[(NR-1)%12]=$0} 
 NR%12==0{for(i=0;i<4;i++) 
            for(j=0;j<3;j++) 
               print a[i+j*4]; 
               delete a
         }' <(seq 1 24)
1
5
9
2
6
10
3
7
11
4
8
12
13
17
21
14
18
22
15
19
23
16
20
24

将以正确的顺序放置file2行。与合并相结合

awk '{print; for(i=1;i<=3;i++) {getline x < "file2_reordered"; print x}}' file1

将为您提供所需的输出。

PS。这个行重排类似于转置一系列3x4矩阵(将每一行作为一个元素)。

更新:反过来思考问题,你可以在处理file2时散布file1内容。这将是一个单一的脚本。

$ awk     '{a[(NR-1)%12]=$0} 
   NR%12==0{for(i=0;i<4;i++) {
               getline x < "file1"; print x; 
               for(j=0;j<3;j++) 
                   print a[i+j*4]
               } 
               delete a
           }' file2
C             18     -2.182951850        -0.000000000        -6.517815410
O             34     -1.393125174        -0.640765928        -5.738276269
O             42     -2.182951850         1.281531856        -6.517815410
O             50     -2.972778044        -0.640765928        -7.297355528
C             20     -4.127401075         0.000000000        -0.446529291
O             36     -3.337574640        -0.640765928         0.333010828
O             44     -4.127401075         1.281531856        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
C             22     -3.314258919        -2.494999886       -15.624910016
O             38     -2.524270589         1.854234106       -14.845370570
O             46     -3.314258919        -1.213468178       -15.624910016
O             54     -4.104085113         1.854234106       -16.404449463
C             24     -6.071850300         0.000000000         5.624757806
O             40     -5.282024106        -0.640765928         6.404297925
O             48     -6.071850300         1.281531856         5.624757806
O             56     -6.861676614        -0.640765928         4.845217687
C             26     -2.023950100         0.000000000         5.624757806
O             58     -2.813776294         0.640765779         4.845217687
O             66     -2.023950100        -1.281531708         5.624757806
O             74     -1.234123906         0.640765779         6.404297925
C             28     -4.286402584        -0.000000000       -12.589102506
O             60     -5.076228778         0.640765779       -13.368642136
O             68     -4.286402584        -1.281531708       -12.589102506
O             76     -3.496576390         0.640765779       -11.809563365
C             30     -6.230851809        -0.000000000        -6.517815410
O             62     -7.020678123         0.640765779        -7.297355528
O             70     -6.230851809        -1.281531708        -6.517815410
O             78     -5.441025615         0.640765779        -5.738276269
C             32     -0.079500634         0.000000000        -0.446529291
O             64     -0.869326828         0.640765779        -1.226068432
O             72     -0.079500634        -1.281531708        -0.446529291
O             80      0.710325077         0.640765779         0.333010828