在模式之间的文本文件中对行进行排序

时间:2019-11-14 10:26:46

标签: python awk sed

我正在尝试对Bash或Python中的模式之间的线进行排序。我想根据第二个字段对行进行排序,并以“,”作为分隔符。

给出以下文本输入文件:

Sample1
T1,64,0.65  MEDIUM
T2,60,0.45  LOW
T3,301,0.68  MEDIUM
T4,65,0.75  HIGH
T5,59,0.72  MEDIUM
T6,51,0.82  HIGH
Sample2
T1,153,0.77  HIGH
T2,152,0.61  MEDIUM
T3,154,0.67  MEDIUM
T4,283,0.66  MEDIUM
T5,161,0.65  MEDIUM
Sample3
T1,147,0.71  MEDIUM
T2,154,0.63  MEDIUM
T3,45,0.63  MEDIUM
T4,259,0.77  HIGH

我希望作为输出:

Sample1
T6,51,0.82  HIGH
T5,59,0.72  MEDIUM
T2,60,0.45  LOW
T1,64,0.65  MEDIUM
T4,65,0.75  HIGH
T3,301,0.68  MEDIUM
Sample2
T2,152,0.61  MEDIUM
T1,153,0.77  HIGH
T3,154,0.67  MEDIUM
T5,161,0.65  MEDIUM
T4,283,0.66  MEDIUM
Sample3
T3,45,0.63  MEDIUM
T1,147,0.71  MEDIUM
T2,154,0.63  MEDIUM
T4,259,0.77  HIGH

我已经尝试了另一篇文章中的glenn jackman提出的建议,但是据我测试,它仅适用于2种模式:

> gawk -v cmd="sort -k2" p=1 '
>     /^PATTERN2/ {          # when we we see the 2nd marker:
>         close("cmd", "to");
>         while (("cmd" |& getline line) >0) print line 
>         p=1
>     }
>     p  {print}             # if p is true, print the line
>     !p {print |& "cmd"}   # if p is false, send the line to `sort`
>     /^PATTERN1/ {p=0}      # when we see the first marker, turn off printing ' FILE

2 个答案:

答案 0 :(得分:3)

您可以通过以下方式使用GNU awk进行此操作:

$ awk 'BEGIN{PROCINFO["sorted_in"]="@val_num_asc"; FS=","}
       /PATTERN/{
         for(i in a) print i
         delete a
         print; next
       }
       { a[$0]=$2 }
       END{ for(i in a) print i }' file

使用PROCINFO["sorted_in"]="@val_num_asc",我们告诉GNU awk以数组元素的值以数字升序出现的方式遍历数组。这个想法是用键使整个行成为数组,并为第二个字段赋值。我们不使用第二个字段作为键,因为可能有重复项。但是,仍然可以通过以下方式实现:

$ awk 'BEGIN{PROCINFO["sorted_in"]="@val_num_asc"; FS=","}
       /PATTERN/{
         for(i in a) print a[i]
         delete a
         print; next
       }
       ($2 in a){ a[$2]=a[$2] ORS $0; next }
       { a[$2] = $0 }
       END{ for(i in a) print a[i] }' file

答案 1 :(得分:1)

请参阅下面的功能。

def sort_lines_by_second_field(source_filename: str, destination_filename: str):
    with open(source_filename) as source:
        lines = source.readlines()
        lines.sort(key=lambda row: int(row.split(',')[1]))
        with open(destination_filename, "w") as destination:
            destination.writelines(lines)

它将读取所有行,并按第二个字段对它们进行排序,然后将它们首先转换为整数,然后将其保存到目标文件中。