我正在尝试删除Python列表中的多余句子。一句话可以包含在另一句话中,我想保留最长的句子。
E.g
my_list = ['Her name is Laura and she\'s from Texas','October','He owns a
dog and a cat', 'Her name is Laura', 'He owns a dog', 'Marie will turn
eighteen in October']
处理完毕后:
my_list = ['Her name is Laura and she\'s from Texas','He owns a
dog and a cat', 'Marie will turn eighteen in October']
答案 0 :(得分:3)
(略好于)二次解,检查下一个最小的条目是否是后续较大字符串中的子字符串。
my_list = sorted(my_list, key=lambda x: -len(x)) # sort in descending order of length
indices_to_delete = []
for i, x in enumerate(my_list[:]):
for j, y in enumerate(my_list[:][i:]):
if x in y:
indices_to_delete.append(i)
break
my_list = [x for i, x in enumerate(my_list) if i not in indices_to_delete]
这方面的一个缺陷是它会对您的数据进行排序。如果您不希望这种情况发生,请不要使用它。
答案 1 :(得分:0)
此方法计算每个字符串是列表中每个其他字符串的子字符串的次数,并消除任何多个子字符串。
my_list = [
'Her name is Laura and she\'s from Texas', 'October',
'He owns a dog and a cat', 'Her name is Laura', 'He owns a dog',
'Marie will turn eighteen in October'
]
redundant_counts = [
len([sent for other_sent in my_list if sent in other_sent]) for sent in my_list
]
my_list = [
sent for count, sent in zip(redundant_counts, my_list) if count == 1
]