Question

我有一个文本文件lists.txt，如下所示：

HI family what are u doing ?
HI Family
what are
Channel 5 is very cheap
Channel 5 is
Channel 5 is very
Pokemon
The best Pokemon is Pikachu

我想清理它，删除任何完全包含在其他行中的行。也就是说，我想要这样的东西：

HI family, what are u doing ?
The best Pokemon is Pikachu
Channel 5 is very cheap

我已经尝试计算大量的字符串，然后将其与grep进行比较，在大的results.txt上找到排序results.txt，但它没什么效果。

Answer 1

如果我理解你的问题，你想要获取一个字符串列表并从中删除任何字符串，这些字符串是列表中其他字符串的子字符串。

在伪代码中，

outer: for string s in l
    for string s2 in l
        if s substringOf s2
            continue outer
    print s

即。为每个字符串循环一次字符串，如果内部循环中的任何测试匹配，则取消外部循环的每次运行。

这是bash中该算法的实现。请注意，文件（list.txt）正在代码中通过重定向运算符<读取两次，一次用于外部循环，一次用于内部。

（我也清理了你的例子，其中有很多错别字。）

$ cat list.txt
HI family what are u doin?
HI family what are
Channel 5 is very cheap
Channel 5 is
Channel 5 is very
Pokemon
The best Pokemon is Pikachu
$ while read line; do while read line2; do [[ $line2 != $line ]] && [[ $line2 == *$line* ]] && continue 2; done <list.txt; echo "$line"; done <list.txt
HI family what are u doin?
Channel 5 is very cheap
The best Pokemon is Pikachu
$

清除自己的字符串中包含的字符串列表

1 个答案: