我有一个短语列表,其中包含诸如(和,或,/,&)的连词。我想将它们每个扩展为所有可能的单独短语。扩展包含连词的短语的最佳方法是什么?使用NLP库或python函数。像" alphabet a/b/c can have color red/blue/green"
。可以将其扩展为九个短语[" alphabet a can have color red", "alphabet a can have color blue",... "alphabet b can have color blue",..."alphabet c can have color green"].
其他示例:
['bag of apples/oranges', 'case of citrus (lemon or limes)',
'chocolates/candy box' , 'bag of shoes & socks',
'pear red/brown/green', 'match box and/or lighter',
'milkshake (soy and almond) added ']
应将其扩展为
['bag of apples','bag of oranges',
'case of citrus lemon', 'case of citrus limes',
'chocolates box' , 'candy box' ,'bag of socks',
'bag of shoes', 'pear red', 'pear brown',
'pear green', 'match box ', 'lighter',
'milkshake almond added', 'milkshake soy added']
答案 0 :(得分:0)
始终可以使用蛮力方法来解决此问题。我一直在寻找聪明的东西。
def expand_by_conjuction(item):
def get_slash_index(item):
for num , ele in enumerate(item):
if "/" in ele:
return num
items = [item]
while any([True for item in items for ele in item if "/" in ele]):
for item in items:
item_org = item
item = item.split()
if any([ True for ele in item if "/" in ele]):
sls_index = get_slash_index(item)
split_conjucted = item[sls_index].split("/")
for idx, part in enumerate(split_conjucted):
n_item = []
n_item += item[:sls_index]
n_item.append(part)
sls_p1 = sls_index +1
if not sls_p1 > len(item):
n_item += item[sls_p1:]
n_item = " ".join(n_item)
#print(n_item)
items.append(n_item)
if item_org in items:
items.remove(item_org)
return items
def slashize_conjuctions(item):
slashize = [' or ', ' and ', ' and/or ', ' or/and ', ' & ']
for conj in slashize:
if conj in item:
item = item.replace(conj,"/")
return item
items = ['bag of apples/oranges', 'case of citrus (lemon or limes)',
'chocolates/candy box' , 'bag of shoes & socks',
'pear red/brown/green', 'match box and/or lighter',
'milkshake (soy and almond) added ']
new_items = []
for string in items:
item = slashize_conjuctions(string)
lst = expand_by_conjuction(item)
lst = [ele.replace("(","").replace(")","") for ele in lst]
[new_items.append(ele) for ele in lst]
#print(f'String:{string} ITEM:{item} --> list{lst}')
print(new_items)