检查列表项python的子字符串

时间:2017-10-02 15:01:00

标签: python list

说我有一个清单:

    list = ['Apple', 'apple cider', 'apple juice', 'Mango', 'Mangosteen', 'Banana']

如何检测列表项是否是其他列表项的子字符串,然后删除其他列表项。该列表现在应该如下所示:

  list = ['Apple', 'Mango', 'Banana']

我只需要获取列表中最基本的字符串版本。

2 个答案:

答案 0 :(得分:0)

一些事情。首先,您不应该使用list作为变量名称,因为它是关键字。此外,我在比较时使用lower(),因为字符串的情况似乎并不相关。

l = ['Apple', 'apple cider', 'apple juice', 'Mango', 'Mangosteen']
basic_items = []  # To save the basic strings (i.e. 'Apple', 'Mango')
for list_item in l:  # Loop through all the items
    item_is_basic = True  # True if the item is basic (which we assume beforehand)
    for item in basic_items:  # Loop through the basic items we already found
        if list_item.lower() in item.lower():
            # If the list item is contained in a basic item, it means the list item is "more basic"
            basic_items.remove(item)  # We remove the item which is not considered basic anymore
            break  # We stop the loop through the basic items
        if item.lower() in list_item.lower():
            # If the list item contains a basic item, it means the list item is NOT basic
            item_is_basic = False
            break  # We stop the loop through the basic items

    if item_is_basic:
        # Finally, if the item is considered basic, we add it to basic_items
        basic_items.append(list_item)

print(basic_items)  # outputs ['Apple', 'Mango']

最后,您可以将基本项目放在单独的列表中,您可以使用它。

答案 1 :(得分:0)

实际上,查找子字符串是一个众所周知的主题,您可以在SO上轻松找到。我将专注于你想要最终得到一个独特的核心成分列表的部分。下面将首先根据项目的长度对项目进行排序,从而增加在列表前面找到基本构建块的可能性。

将basic_items变成集合可能是多余的,但它至少保证了唯一的表示。

listt = ['Apple', 'apple cider', 'apple juice', 'Mango', 'Mangosteen']

listt = sorted(listt, key=len)

basic_items = set()

for val in listt:
    if not any([val.lower().find(x.lower()) != -1 for x in basic_items]):
        basic_items.add(val)

listt = list(basic_items)