我有一个文本文件lists.txt
,如下所示:
HI family what are u doing ?
HI Family
what are
Channel 5 is very cheap
Channel 5 is
Channel 5 is very
Pokemon
The best Pokemon is Pikachu
我想清理它,删除任何完全包含在其他行中的行。也就是说,我想要这样的东西:
HI family, what are u doing ?
The best Pokemon is Pikachu
Channel 5 is very cheap
我已经尝试计算大量的字符串,然后将其与grep进行比较,在大的results.txt上找到排序results.txt,但它没什么效果。
答案 0 :(得分:7)
如果我理解你的问题,你想要获取一个字符串列表并从中删除任何字符串,这些字符串是列表中其他字符串的子字符串。
在伪代码中,
outer: for string s in l
for string s2 in l
if s substringOf s2
continue outer
print s
即。为每个字符串循环一次字符串,如果内部循环中的任何测试匹配,则取消外部循环的每次运行。
这是bash中该算法的实现。请注意,文件(list.txt
)正在代码中通过重定向运算符<
读取两次,一次用于外部循环,一次用于内部。
(我也清理了你的例子,其中有很多错别字。)
$ cat list.txt
HI family what are u doin?
HI family what are
Channel 5 is very cheap
Channel 5 is
Channel 5 is very
Pokemon
The best Pokemon is Pikachu
$ while read line; do while read line2; do [[ $line2 != $line ]] && [[ $line2 == *$line* ]] && continue 2; done <list.txt; echo "$line"; done <list.txt
HI family what are u doin?
Channel 5 is very cheap
The best Pokemon is Pikachu
$