清除自己的字符串中包含的字符串列表

时间:2014-05-28 16:02:36

标签: python string bash list shell

我有一个文本文件lists.txt,如下所示:

HI family what are u doing ?
HI Family
what are
Channel 5 is very cheap
Channel 5 is
Channel 5 is very
Pokemon
The best Pokemon is Pikachu

我想清理它,删除任何完全包含在其他行中的行。也就是说,我想要这样的东西:

HI family, what are u doing ?
The best Pokemon is Pikachu
Channel 5 is very cheap

我已经尝试计算大量的字符串,然后将其与grep进行比较,在大的results.txt上找到排序results.txt,但它没什么效果。

1 个答案:

答案 0 :(得分:7)

如果我理解你的问题,你想要获取一个字符串列表并从中删除任何字符串,这些字符串是列表中其他字符串的子字符串。

在伪代码中,

outer: for string s in l
    for string s2 in l
        if s substringOf s2
            continue outer
    print s

即。为每个字符串循环一次字符串,如果内部循环中的任何测试匹配,则取消外部循环的每次运行。

这是bash中该算法的实现。请注意,文件(list.txt)正在代码中通过重定向运算符<读取两次,一次用于外部循环,一次用于内部。

(我也清理了你的例子,其中有很多错别字。)

$ cat list.txt
HI family what are u doin?
HI family what are
Channel 5 is very cheap
Channel 5 is
Channel 5 is very
Pokemon
The best Pokemon is Pikachu
$ while read line; do while read line2; do [[ $line2 != $line ]] && [[ $line2 == *$line* ]] && continue 2; done <list.txt; echo "$line"; done <list.txt
HI family what are u doin?
Channel 5 is very cheap
The best Pokemon is Pikachu
$