使用Python删除列表中的冗余句子

时间:2017-06-15 13:27:22

标签: python

我正在尝试删除Python列表中的多余句子。一句话可以包含在另一句话中,我想保留最长的句子。

E.g

    my_list = ['Her name is Laura and she\'s from Texas','October','He owns a 
    dog and a cat', 'Her name is Laura', 'He owns a dog', 'Marie will turn 
    eighteen in October']

处理完毕后:

    my_list = ['Her name is Laura and she\'s from Texas','He owns a 
    dog and a cat', 'Marie will turn eighteen in October']

2 个答案:

答案 0 :(得分:3)

(略好于)二次解,检查下一个最小的条目是否是后续较大字符串中的子字符串。

my_list = sorted(my_list, key=lambda x: -len(x)) # sort in descending order of length

indices_to_delete = [] 
for i, x in enumerate(my_list[:]):
    for j, y in enumerate(my_list[:][i:]):
        if x in y:
            indices_to_delete.append(i)
            break

my_list = [x for i, x in enumerate(my_list) if i not in indices_to_delete]

这方面的一个缺陷是它会对您的数据进行排序。如果您不希望这种情况发生,请不要使用它。

答案 1 :(得分:0)

此方法计算每个字符串是列表中每个其他字符串的子字符串的次数,并消除任何多个子字符串。

my_list = [
    'Her name is Laura and she\'s from Texas', 'October',
    'He owns a dog and a cat', 'Her name is Laura', 'He owns a dog',
    'Marie will turn eighteen in October'
]

redundant_counts = [
    len([sent for other_sent in my_list if sent in other_sent]) for sent in my_list
]

my_list = [
    sent for count, sent in zip(redundant_counts, my_list) if count == 1
]