按长度修剪相似的字符串

时间:2011-08-31 06:37:21

标签: python list

如何通过相似性和长度从Python列表中删除包含字符串的元素(如果在另一个字符串 X 中找到字符串 Y X 必须删除)?

IN:     [('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)]
OUT:    [('this is string that stays', 0), ('i am safe', 3)]

4 个答案:

答案 0 :(得分:0)

你走了:

l = [('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)]
survivors = set(s for s, _ in l)
for s1, _ in l:
if any(s1 != s2 and s1 in s2 for s2 in survivors):
    survivors.discard(s1)

survivors是你想要的,除了它不包含输入元组的数字 - 改变这应该是读者的练习:-P。

答案 1 :(得分:0)

试试这个:

IN = [('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)]
OUT=[]

def check_item(liste, item2check):
    for item, _ in liste:
        if item2check in item and len(item2check) < len(item):
            return True
    return False

for item, rank in IN:
    if not check_item(IN, item):
        OUT.append((item, rank))

# or in a list-comprehension : 
OUT = [(item, rank) for item, rank in IN if not check_item(IN, item)]
print OUT

>>> [('this is string that stays', 0), ('i am safe', 3)]

答案 2 :(得分:0)

如果你不介意订单(N * N)

>>> s=[('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)]
>>> s=[i[0] for i in s]
>>> result=[s[i] for i in range(len(s)) if not any(s[i] in s[j] for j in range(i)+range(i+1,len(s)-i))]
>>> result
['this is string that stays', 'i am safe']

如果您关心效率,我建议您将每个字符串拆分为一系列单词(甚至是字符),并创建一个树数据结构,如trie(http://community.topcoder.com/tc?module = Static&amp; d1 = tutorials&amp; d2 = usingTries),允许在每个子序列上快速查找

答案 3 :(得分:0)

所有,其他答案提供了良好的解决方案。我只想在你的尝试中添加一个注释:

for i in range(0, len(d)):
  for j in range(1, len(d)):
    if d[j][0] in d[i][0] and len(d[i][0]) > len(d[j][0]): 
      del d[j]

这会因列表索引超出范围而失败,因为您在迭代列表时进行删除。这是防止此问题的一种方法:

d = [('this is string that stays', 0), ('this is string', 1), ('string that stays', 2), ('i am safe', 3)]

to_be_removed = list()
for i in range(0, len(d)):
  for j in range(0, len(d)): 
    if i != j and d[j][0] in d[i][0] and len(d[i][0]) > len(d[j][0]):
      to_be_removed.append(j)
for m, n in enumerate(to_be_removed):
  del d[n - m]

print d