可以使用awk仅提取具有不同的连续行(在本例中为20) 第3列中的字符(从C到H13)并且第5列与文件相同 结构如下:
............................................... ..........................
LINE 564 C LESS L3782 246.617 200.380 10.086 1.00 0.00 L
LINE 565 C1 LESS L3782 247.525 201.163 9.136 1.00 0.00 L
LINE 566 C2 LESS L3782 247.265 202.663 9.269 1.00 0.00 L
LINE 567 C3 LESS L3782 249.012 200.776 9.298 1.00 0.00 L
LINE 568 C4 LESS L3782 249.659 201.089 10.654 1.00 0.00 L
LINE 569 C5 LESS L3782 251.029 200.429 10.766 1.00 0.00 L
LINE 570 O LESS L3782 249.832 202.495 10.789 1.00 0.00 L
LINE 571 H LESS L3782 246.797 199.303 9.997 1.00 0.00 L
LINE 572 H1 LESS L3782 246.772 200.668 11.130 1.00 0.00 L
LINE 592 C LESS L3818 134.617 208.380 10.086 1.00 0.00 L
LINE 593 C1 LESS L3818 135.525 209.163 9.136 1.00 0.00 L
LINE 594 C2 LESS L3818 135.265 210.663 9.269 1.00 0.00 L
LINE 595 C3 LESS L3818 137.012 208.776 9.298 1.00 0.00 L
LINE 596 C4 LESS L3818 137.659 209.089 10.654 1.00 0.00 L
LINE 597 C5 LESS L3818 139.029 208.429 10.766 1.00 0.00 L
LINE 598 O LESS L3818 137.832 210.495 10.789 1.00 0.00 L
LINE 599 H LESS L3818 134.797 207.303 9.997 1.00 0.00 L
LINE 600 H1 LESS L3818 134.772 208.668 11.130 1.00 0.00 L
LINE 601 H2 LESS L3818 133.564 208.562 9.845 1.00 0.00 L
LINE 602 H3 LESS L3818 135.242 208.879 8.114 1.00 0.00 L
LINE 603 H4 LESS L3818 135.381 211.008 10.301 1.00 0.00 L
LINE 604 H5 LESS L3818 134.241 210.901 8.961 1.00 0.00 L
LINE 605 H6 LESS L3818 135.946 211.237 8.632 1.00 0.00 L
LINE 606 H7 LESS L3818 137.579 209.288 8.508 1.00 0.00 L
LINE 607 H8 LESS L3818 137.099 207.700 9.100 1.00 0.00 L
LINE 608 H9 LESS L3818 137.027 208.740 11.477 1.00 0.00 L
LINE 609 H10 LESS L3818 138.225 210.662 11.662 1.00 0.00 L
LINE 610 H11 LESS L3818 139.496 208.674 11.726 1.00 0.00 L
LINE 611 H12 LESS L3818 138.955 207.340 10.685 1.00 0.00 L
LINE 612 H13 LESS L3818 139.705 208.795 9.985 1.00 0.00 L
LINE 618 C5 LESS L3832 251.029 208.429 10.766 1.00 0.00 L
LINE 619 O LESS L3832 249.832 210.495 10.789 1.00 0.00 L
LINE 620 H LESS L3832 246.797 207.303 9.997 1.00 0.00 L
LINE 621 H1 LESS L3832 246.772 208.668 11.130 1.00 0.00 L
LINE 622 H2 LESS L3832 245.564 208.562 9.845 1.00 0.00 L
LINE 626 H6 LESS L3832 247.946 211.237 8.632 1.00 0.00 L
LINE 627 H7 LESS L3832 249.579 209.288 8.508 1.00 0.00 L
LINE 628 H8 LESS L3832 249.099 207.700 9.100 1.00 0.00 L
LINE 629 H9 LESS L3832 249.027 208.740 11.477 1.00 0.00 L
LINE 630 H10 LESS L3832 250.225 210.662 11.662 1.00 0.00 L
LINE 631 H11 LESS L3832 251.496 208.674 11.726 1.00 0.00 L
LINE 632 H12 LESS L3832 250.955 207.340 10.685 1.00 0.00 L
LINE 633 H13 LESS L3832 251.705 208.795 9.985 1.00 0.00 L
LINE 638 C LESS L3868 134.617 216.380 10.086 1.00 0.00 L
LINE 639 C1 LESS L3868 135.525 217.163 9.136 1.00 0.00 L
LINE 640 C2 LESS L3868 135.265 218.663 9.269 1.00 0.00 L
LINE 641 C3 LESS L3868 137.012 216.776 9.298 1.00 0.00 L
LINE 642 C4 LESS L3868 137.659 217.089 10.654 1.00 0.00 L
LINE 643 C5 LESS L3868 139.029 216.429 10.766 1.00 0.00 L
LINE 644 O LESS L3868 137.832 218.495 10.789 1.00 0.00 L
LINE 645 H LESS L3868 134.797 215.303 9.997 1.00 0.00 L
LINE 646 H1 LESS L3868 134.772 216.668 11.130 1.00 0.00 L
LINE 647 H2 LESS L3868 133.564 216.562 9.845 1.00 0.00 L
LINE 648 H3 LESS L3868 135.242 216.879 8.114 1.00 0.00 L
LINE 649 H4 LESS L3868 135.381 219.008 10.301 1.00 0.00 L
LINE 650 H5 LESS L3868 134.241 218.901 8.961 1.00 0.00 L
LINE 651 H6 LESS L3868 135.946 219.237 8.632 1.00 0.00 L
LINE 652 H7 LESS L3868 137.579 217.288 8.508 1.00 0.00 L
LINE 653 H8 LESS L3868 137.099 215.700 9.100 1.00 0.00 L
LINE 654 H9 LESS L3868 137.027 216.740 11.477 1.00 0.00 L
LINE 655 H10 LESS L3868 138.225 218.662 11.662 1.00 0.00 L
LINE 656 H11 LESS L3868 139.496 216.674 11.726 1.00 0.00 L
LINE 657 H12 LESS L3868 138.955 215.340 10.685 1.00 0.00 L
LINE 658 H13 LESS L3868 139.705 216.795 9.985 1.00 0.00 L
LINE 677 O LESS L3882 249.832 218.495 10.789 1.00 0.00 L
LINE 678 H LESS L3882 246.797 215.303 9.997 1.00 0.00 L
LINE 679 H1 LESS L3882 246.772 216.668 11.130 1.00 0.00 L
LINE 680 H2 LESS L3882 245.564 216.562 9.845 1.00 0.00 L
.........................................................................
导致输出如下:
LINE 592 C LESS L3818 134.617 208.380 10.086 1.00 0.00 L
LINE 593 C1 LESS L3818 135.525 209.163 9.136 1.00 0.00 L
LINE 594 C2 LESS L3818 135.265 210.663 9.269 1.00 0.00 L
LINE 595 C3 LESS L3818 137.012 208.776 9.298 1.00 0.00 L
LINE 596 C4 LESS L3818 137.659 209.089 10.654 1.00 0.00 L
LINE 597 C5 LESS L3818 139.029 208.429 10.766 1.00 0.00 L
LINE 598 O LESS L3818 137.832 210.495 10.789 1.00 0.00 L
LINE 599 H LESS L3818 134.797 207.303 9.997 1.00 0.00 L
LINE 600 H1 LESS L3818 134.772 208.668 11.130 1.00 0.00 L
LINE 601 H2 LESS L3818 133.564 208.562 9.845 1.00 0.00 L
LINE 602 H3 LESS L3818 135.242 208.879 8.114 1.00 0.00 L
LINE 603 H4 LESS L3818 135.381 211.008 10.301 1.00 0.00 L
LINE 604 H5 LESS L3818 134.241 210.901 8.961 1.00 0.00 L
LINE 605 H6 LESS L3818 135.946 211.237 8.632 1.00 0.00 L
LINE 606 H7 LESS L3818 137.579 209.288 8.508 1.00 0.00 L
LINE 607 H8 LESS L3818 137.099 207.700 9.100 1.00 0.00 L
LINE 608 H9 LESS L3818 137.027 208.740 11.477 1.00 0.00 L
LINE 609 H10 LESS L3818 138.225 210.662 11.662 1.00 0.00 L
LINE 610 H11 LESS L3818 139.496 208.674 11.726 1.00 0.00 L
LINE 611 H12 LESS L3818 138.955 207.340 10.685 1.00 0.00 L
LINE 612 H13 LESS L3818 139.705 208.795 9.985 1.00 0.00 L
LINE 638 C LESS L3868 134.617 216.380 10.086 1.00 0.00 L
LINE 639 C1 LESS L3868 135.525 217.163 9.136 1.00 0.00 L
LINE 640 C2 LESS L3868 135.265 218.663 9.269 1.00 0.00 L
LINE 641 C3 LESS L3868 137.012 216.776 9.298 1.00 0.00 L
LINE 642 C4 LESS L3868 137.659 217.089 10.654 1.00 0.00 L
LINE 643 C5 LESS L3868 139.029 216.429 10.766 1.00 0.00 L
LINE 644 O LESS L3868 137.832 218.495 10.789 1.00 0.00 L
LINE 645 H LESS L3868 134.797 215.303 9.997 1.00 0.00 L
LINE 646 H1 LESS L3868 134.772 216.668 11.130 1.00 0.00 L
LINE 647 H2 LESS L3868 133.564 216.562 9.845 1.00 0.00 L
LINE 648 H3 LESS L3868 135.242 216.879 8.114 1.00 0.00 L
LINE 649 H4 LESS L3868 135.381 219.008 10.301 1.00 0.00 L
LINE 650 H5 LESS L3868 134.241 218.901 8.961 1.00 0.00 L
LINE 651 H6 LESS L3868 135.946 219.237 8.632 1.00 0.00 L
LINE 652 H7 LESS L3868 137.579 217.288 8.508 1.00 0.00 L
LINE 653 H8 LESS L3868 137.099 215.700 9.100 1.00 0.00 L
LINE 654 H9 LESS L3868 137.027 216.740 11.477 1.00 0.00 L
LINE 655 H10 LESS L3868 138.225 218.662 11.662 1.00 0.00 L
LINE 656 H11 LESS L3868 139.496 216.674 11.726 1.00 0.00 L
LINE 657 H12 LESS L3868 138.955 215.340 10.685 1.00 0.00 L
LINE 658 H13 LESS L3868 139.705 216.795 9.985 1.00 0.00 L
谢谢你, 阿林
答案 0 :(得分:0)
awk '{if (a[$3,$5]++ == 0) print}'
严格来说,这并不担心连续性;如果L3818的一些新条目出现在文件的更下方,那么它将记住那些来自顶部附近的条目。如果这是一个问题,您可以使用:
awk '{if ($5 != old_5) {delete a; old_5 = $5} if (a[$3,$5] == 0) print}'
答案 1 :(得分:-1)
是。与Perl一样,AWK是一种数据提取和报告工具。您可以使用数组来检查第三列中的字符集是否唯一。您还可以使用变量来存储和检查第5列的标识。
awk -v n=20 '{ r = (r ? r RS : "") $0; c++ } $3 in a || s != $5 { r=$0; c=""; delete a } c == n { print r; r=c=""; delete a } { a[$3]; s = $5 }' file