说我有一个清单:
list = ['Apple', 'apple cider', 'apple juice', 'Mango', 'Mangosteen', 'Banana']
如何检测列表项是否是其他列表项的子字符串,然后删除其他列表项。该列表现在应该如下所示:
list = ['Apple', 'Mango', 'Banana']
我只需要获取列表中最基本的字符串版本。
答案 0 :(得分:0)
一些事情。首先,您不应该使用list
作为变量名称,因为它是关键字。此外,我在比较时使用lower()
,因为字符串的情况似乎并不相关。
l = ['Apple', 'apple cider', 'apple juice', 'Mango', 'Mangosteen']
basic_items = [] # To save the basic strings (i.e. 'Apple', 'Mango')
for list_item in l: # Loop through all the items
item_is_basic = True # True if the item is basic (which we assume beforehand)
for item in basic_items: # Loop through the basic items we already found
if list_item.lower() in item.lower():
# If the list item is contained in a basic item, it means the list item is "more basic"
basic_items.remove(item) # We remove the item which is not considered basic anymore
break # We stop the loop through the basic items
if item.lower() in list_item.lower():
# If the list item contains a basic item, it means the list item is NOT basic
item_is_basic = False
break # We stop the loop through the basic items
if item_is_basic:
# Finally, if the item is considered basic, we add it to basic_items
basic_items.append(list_item)
print(basic_items) # outputs ['Apple', 'Mango']
最后,您可以将基本项目放在单独的列表中,您可以使用它。
答案 1 :(得分:0)
实际上,查找子字符串是一个众所周知的主题,您可以在SO上轻松找到。我将专注于你想要最终得到一个独特的核心成分列表的部分。下面将首先根据项目的长度对项目进行排序,从而增加在列表前面找到基本构建块的可能性。
将basic_items变成集合可能是多余的,但它至少保证了唯一的表示。
listt = ['Apple', 'apple cider', 'apple juice', 'Mango', 'Mangosteen']
listt = sorted(listt, key=len)
basic_items = set()
for val in listt:
if not any([val.lower().find(x.lower()) != -1 for x in basic_items]):
basic_items.add(val)
listt = list(basic_items)