通过比较多个字符串查找和删除文件中的行

时间:2018-09-22 04:04:54

标签: bash macos

我有以下文件:

SOME TEXT AT START OF FILE
    STRING1 SMALL
    STRING2 SMALL
    STRING1 MEDIUM
    STRING3 LARGE
    STRING2 XLG
SOME TEXT TO SEPARATE LISTS
    STRING4 SMALL
    STRING1 MEDIUM
    STRING1 SMALL
    STRING5 LARGE
    STRING6 SMALL
SOME MORE TEXT TO SEPARATE LISTS
    ANOTHER LIST
...

对于每个列表,我只想保留每个字符串中出现次数最多的(S,M,L,XL),以便结果看起来像这样:

SOME TEXT AT START OF FILE
    STRING1 MEDIUM
    STRING3 LARGE
    STRING2 XLG
SOME TEXT TO SEPARATE LISTS
    STRING4 SMALL
    STRING1 MEDIUM
    STRING5 LARGE
    STRING6 SMALL
SOME MORE TEXT TO SEPARATE LISTS
    ANOTHER LIST
...

我不知道该怎么做。请帮忙。我正在尝试通过Mac上的终端在bash脚本中执行此操作。


我还需要修改另一个类似的列表

TEXT
    STRING1
    STRING2
    STRING3
    STRING1
TEXT
    STRING4
    STRING1
TEXT
    STRING5
    STRING2
    STRING5
ETC...

在这种情况下,如何消除重复的字符串?我打算尝试使用awk '!seen[$0]++' filename,但是这会从每个列表中删除字符串,而不是分别查看每个列表。

1 个答案:

答案 0 :(得分:1)

第一个问题

$ cat tst.awk
BEGIN {
    sz["SMALL"]  = 0
    sz["MEDIUM"] = 1
    sz["LARGE"]  = 2
    sz["XLG"]    = 3
}

/^[^ ]/ {
    dump()
    delete data
    print
    next
}

!($1 in data) || sz[data[$1]] < sz[$2] {
    data[$1] = $2
}

END {
    dump()
}

function dump(k) {
    for (k in data)
        print "    " k " " data[k]
}
$
$ awk -f tst.awk file
SOME TEXT AT START OF FILE
    STRING1 MEDIUM
    STRING2 XLG
    STRING3 LARGE
SOME TEXT TO SEPARATE LISTS
    STRING4 SMALL
    STRING5 LARGE
    STRING6 SMALL
    STRING1 MEDIUM
SOME MORE TEXT TO SEPARATE LISTS
    ANOTHER LIST
...

第二个

awk '/^[^ ]/{delete seen}!seen[$0]++' file