使用awk仅提取列中具有不同字符的连续行,并使另一列相同

时间:2014-06-15 10:43:13

标签: awk

可以使用awk仅提取具有不同的连续行(在本例中为20) 第3列中的字符(从C到H13)并且第5列与文件相同 结构如下:

............................................... ..........................

LINE    564  C   LESS L3782     246.617 200.380  10.086  1.00  0.00      L     
LINE    565  C1  LESS L3782     247.525 201.163   9.136  1.00  0.00      L     
LINE    566  C2  LESS L3782     247.265 202.663   9.269  1.00  0.00      L     
LINE    567  C3  LESS L3782     249.012 200.776   9.298  1.00  0.00      L     
LINE    568  C4  LESS L3782     249.659 201.089  10.654  1.00  0.00      L     
LINE    569  C5  LESS L3782     251.029 200.429  10.766  1.00  0.00      L     
LINE    570  O   LESS L3782     249.832 202.495  10.789  1.00  0.00      L     
LINE    571  H   LESS L3782     246.797 199.303   9.997  1.00  0.00      L     
LINE    572  H1  LESS L3782     246.772 200.668  11.130  1.00  0.00      L         
LINE    592  C   LESS L3818     134.617 208.380  10.086  1.00  0.00      L     
LINE    593  C1  LESS L3818     135.525 209.163   9.136  1.00  0.00      L     
LINE    594  C2  LESS L3818     135.265 210.663   9.269  1.00  0.00      L     
LINE    595  C3  LESS L3818     137.012 208.776   9.298  1.00  0.00      L     
LINE    596  C4  LESS L3818     137.659 209.089  10.654  1.00  0.00      L     
LINE    597  C5  LESS L3818     139.029 208.429  10.766  1.00  0.00      L     
LINE    598  O   LESS L3818     137.832 210.495  10.789  1.00  0.00      L     
LINE    599  H   LESS L3818     134.797 207.303   9.997  1.00  0.00      L     
LINE    600  H1  LESS L3818     134.772 208.668  11.130  1.00  0.00      L     
LINE    601  H2  LESS L3818     133.564 208.562   9.845  1.00  0.00      L     
LINE    602  H3  LESS L3818     135.242 208.879   8.114  1.00  0.00      L     
LINE    603  H4  LESS L3818     135.381 211.008  10.301  1.00  0.00      L     
LINE    604  H5  LESS L3818     134.241 210.901   8.961  1.00  0.00      L     
LINE    605  H6  LESS L3818     135.946 211.237   8.632  1.00  0.00      L     
LINE    606  H7  LESS L3818     137.579 209.288   8.508  1.00  0.00      L     
LINE    607  H8  LESS L3818     137.099 207.700   9.100  1.00  0.00      L     
LINE    608  H9  LESS L3818     137.027 208.740  11.477  1.00  0.00      L     
LINE    609  H10 LESS L3818     138.225 210.662  11.662  1.00  0.00      L     
LINE    610  H11 LESS L3818     139.496 208.674  11.726  1.00  0.00      L     
LINE    611  H12 LESS L3818     138.955 207.340  10.685  1.00  0.00      L     
LINE    612  H13 LESS L3818     139.705 208.795   9.985  1.00  0.00      L        
LINE    618  C5  LESS L3832     251.029 208.429  10.766  1.00  0.00      L     
LINE    619  O   LESS L3832     249.832 210.495  10.789  1.00  0.00      L     
LINE    620  H   LESS L3832     246.797 207.303   9.997  1.00  0.00      L     
LINE    621  H1  LESS L3832     246.772 208.668  11.130  1.00  0.00      L     
LINE    622  H2  LESS L3832     245.564 208.562   9.845  1.00  0.00      L     
LINE    626  H6  LESS L3832     247.946 211.237   8.632  1.00  0.00      L     
LINE    627  H7  LESS L3832     249.579 209.288   8.508  1.00  0.00      L     
LINE    628  H8  LESS L3832     249.099 207.700   9.100  1.00  0.00      L     
LINE    629  H9  LESS L3832     249.027 208.740  11.477  1.00  0.00      L     
LINE    630  H10 LESS L3832     250.225 210.662  11.662  1.00  0.00      L     
LINE    631  H11 LESS L3832     251.496 208.674  11.726  1.00  0.00      L     
LINE    632  H12 LESS L3832     250.955 207.340  10.685  1.00  0.00      L     
LINE    633  H13 LESS L3832     251.705 208.795   9.985  1.00  0.00      L     
LINE    638  C   LESS L3868     134.617 216.380  10.086  1.00  0.00      L     
LINE    639  C1  LESS L3868     135.525 217.163   9.136  1.00  0.00      L     
LINE    640  C2  LESS L3868     135.265 218.663   9.269  1.00  0.00      L     
LINE    641  C3  LESS L3868     137.012 216.776   9.298  1.00  0.00      L     
LINE    642  C4  LESS L3868     137.659 217.089  10.654  1.00  0.00      L     
LINE    643  C5  LESS L3868     139.029 216.429  10.766  1.00  0.00      L     
LINE    644  O   LESS L3868     137.832 218.495  10.789  1.00  0.00      L     
LINE    645  H   LESS L3868     134.797 215.303   9.997  1.00  0.00      L     
LINE    646  H1  LESS L3868     134.772 216.668  11.130  1.00  0.00      L     
LINE    647  H2  LESS L3868     133.564 216.562   9.845  1.00  0.00      L     
LINE    648  H3  LESS L3868     135.242 216.879   8.114  1.00  0.00      L     
LINE    649  H4  LESS L3868     135.381 219.008  10.301  1.00  0.00      L     
LINE    650  H5  LESS L3868     134.241 218.901   8.961  1.00  0.00      L     
LINE    651  H6  LESS L3868     135.946 219.237   8.632  1.00  0.00      L     
LINE    652  H7  LESS L3868     137.579 217.288   8.508  1.00  0.00      L     
LINE    653  H8  LESS L3868     137.099 215.700   9.100  1.00  0.00      L     
LINE    654  H9  LESS L3868     137.027 216.740  11.477  1.00  0.00      L     
LINE    655  H10 LESS L3868     138.225 218.662  11.662  1.00  0.00      L     
LINE    656  H11 LESS L3868     139.496 216.674  11.726  1.00  0.00      L     
LINE    657  H12 LESS L3868     138.955 215.340  10.685  1.00  0.00      L     
LINE    658  H13 LESS L3868     139.705 216.795   9.985  1.00  0.00      L     
LINE    677  O   LESS L3882     249.832 218.495  10.789  1.00  0.00      L     
LINE    678  H   LESS L3882     246.797 215.303   9.997  1.00  0.00      L     
LINE    679  H1  LESS L3882     246.772 216.668  11.130  1.00  0.00      L     
LINE    680  H2  LESS L3882     245.564 216.562   9.845  1.00  0.00      L     
.........................................................................

导致输出如下:

LINE    592  C   LESS L3818     134.617 208.380  10.086  1.00  0.00      L     
LINE    593  C1  LESS L3818     135.525 209.163   9.136  1.00  0.00      L     
LINE    594  C2  LESS L3818     135.265 210.663   9.269  1.00  0.00      L     
LINE    595  C3  LESS L3818     137.012 208.776   9.298  1.00  0.00      L     
LINE    596  C4  LESS L3818     137.659 209.089  10.654  1.00  0.00      L     
LINE    597  C5  LESS L3818     139.029 208.429  10.766  1.00  0.00      L     
LINE    598  O   LESS L3818     137.832 210.495  10.789  1.00  0.00      L     
LINE    599  H   LESS L3818     134.797 207.303   9.997  1.00  0.00      L     
LINE    600  H1  LESS L3818     134.772 208.668  11.130  1.00  0.00      L     
LINE    601  H2  LESS L3818     133.564 208.562   9.845  1.00  0.00      L     
LINE    602  H3  LESS L3818     135.242 208.879   8.114  1.00  0.00      L     
LINE    603  H4  LESS L3818     135.381 211.008  10.301  1.00  0.00      L     
LINE    604  H5  LESS L3818     134.241 210.901   8.961  1.00  0.00      L     
LINE    605  H6  LESS L3818     135.946 211.237   8.632  1.00  0.00      L     
LINE    606  H7  LESS L3818     137.579 209.288   8.508  1.00  0.00      L     
LINE    607  H8  LESS L3818     137.099 207.700   9.100  1.00  0.00      L     
LINE    608  H9  LESS L3818     137.027 208.740  11.477  1.00  0.00      L     
LINE    609  H10 LESS L3818     138.225 210.662  11.662  1.00  0.00      L     
LINE    610  H11 LESS L3818     139.496 208.674  11.726  1.00  0.00      L     
LINE    611  H12 LESS L3818     138.955 207.340  10.685  1.00  0.00      L     
LINE    612  H13 LESS L3818     139.705 208.795   9.985  1.00  0.00      L        
LINE    638  C   LESS L3868     134.617 216.380  10.086  1.00  0.00      L     
LINE    639  C1  LESS L3868     135.525 217.163   9.136  1.00  0.00      L     
LINE    640  C2  LESS L3868     135.265 218.663   9.269  1.00  0.00      L     
LINE    641  C3  LESS L3868     137.012 216.776   9.298  1.00  0.00      L     
LINE    642  C4  LESS L3868     137.659 217.089  10.654  1.00  0.00      L     
LINE    643  C5  LESS L3868     139.029 216.429  10.766  1.00  0.00      L     
LINE    644  O   LESS L3868     137.832 218.495  10.789  1.00  0.00      L     
LINE    645  H   LESS L3868     134.797 215.303   9.997  1.00  0.00      L     
LINE    646  H1  LESS L3868     134.772 216.668  11.130  1.00  0.00      L     
LINE    647  H2  LESS L3868     133.564 216.562   9.845  1.00  0.00      L     
LINE    648  H3  LESS L3868     135.242 216.879   8.114  1.00  0.00      L     
LINE    649  H4  LESS L3868     135.381 219.008  10.301  1.00  0.00      L     
LINE    650  H5  LESS L3868     134.241 218.901   8.961  1.00  0.00      L     
LINE    651  H6  LESS L3868     135.946 219.237   8.632  1.00  0.00      L     
LINE    652  H7  LESS L3868     137.579 217.288   8.508  1.00  0.00      L     
LINE    653  H8  LESS L3868     137.099 215.700   9.100  1.00  0.00      L     
LINE    654  H9  LESS L3868     137.027 216.740  11.477  1.00  0.00      L     
LINE    655  H10 LESS L3868     138.225 218.662  11.662  1.00  0.00      L     
LINE    656  H11 LESS L3868     139.496 216.674  11.726  1.00  0.00      L     
LINE    657  H12 LESS L3868     138.955 215.340  10.685  1.00  0.00      L     
LINE    658  H13 LESS L3868     139.705 216.795   9.985  1.00  0.00      L 

谢谢你, 阿林

2 个答案:

答案 0 :(得分:0)

answer目前接受的Steve是一种非常冗长的写作方式:

awk '{if (a[$3,$5]++ == 0) print}'

严格来说,这并不担心连续性;如果L3818的一些新条目出现在文件的更下方,那么它将记住那些来自顶部附近的条目。如果这是一个问题,您可以使用:

awk '{if ($5 != old_5) {delete a; old_5 = $5} if (a[$3,$5] == 0) print}'

答案 1 :(得分:-1)

是。与Perl一样,AWK是一种数据提取和报告工具。您可以使用数组来检查第三列中的字符集是否唯一。您还可以使用变量来存储和检查第5列的标识。

awk -v n=20 '{ r = (r ? r RS : "") $0; c++ } $3 in a || s != $5 { r=$0; c=""; delete a } c == n { print r; r=c=""; delete a } { a[$3]; s = $5 }' file