扩展包含诸如(/,和,或,&)的连词的短语

时间:2019-02-07 19:49:42

标签: python string nlp

我有一个短语列表,其中包含诸如(和​​,或,/,&)的连词。我想将它们每个扩展为所有可能的单独短语。扩展包含连词的短语的最佳方法是什么?使用NLP库或python函数。像" alphabet a/b/c can have color red/blue/green"。可以将其扩展为九个短语[" alphabet a can have color red", "alphabet a can have color blue",... "alphabet b can have color blue",..."alphabet c can have color green"].

其他示例:

    ['bag of apples/oranges', 'case of citrus (lemon or limes)',
'chocolates/candy box' , 'bag of shoes & socks', 
'pear red/brown/green', 'match box and/or lighter',
 'milkshake (soy and almond) added ']

应将其扩展为

    ['bag of apples','bag of oranges',
 'case of citrus lemon', 'case of citrus limes',
'chocolates box' , 'candy box' ,'bag of socks', 
'bag of shoes', 'pear red', 'pear brown',
'pear green', 'match box ', 'lighter',
'milkshake almond added', 'milkshake soy added']

1 个答案:

答案 0 :(得分:0)

始终可以使用蛮力方法来解决此问题。我一直在寻找聪明的东西。

def expand_by_conjuction(item): 
    def get_slash_index(item):           
        for num , ele in enumerate(item):
            if "/" in ele:
                return num  
    items = [item]
    while any([True for item in items for ele in item if "/" in ele]):
        for item in items:
            item_org = item
            item = item.split()
            if any([ True for ele in item if "/" in ele]):

                sls_index = get_slash_index(item)                       
                split_conjucted = item[sls_index].split("/")

                for idx, part in enumerate(split_conjucted):
                    n_item = []
                    n_item += item[:sls_index]
                    n_item.append(part)
                    sls_p1 = sls_index +1
                    if not sls_p1 > len(item):
                        n_item += item[sls_p1:]   
                    n_item = " ".join(n_item)
                    #print(n_item)
                    items.append(n_item)
                    if item_org in items:
                        items.remove(item_org)
    return items

def slashize_conjuctions(item):
    slashize = [' or ', ' and ', ' and/or ', ' or/and ', ' & ']
    for conj in slashize:
        if conj in item:
            item = item.replace(conj,"/")
    return item


items = ['bag of apples/oranges', 'case of citrus (lemon or limes)',
'chocolates/candy box' , 'bag of shoes & socks', 
'pear red/brown/green', 'match box and/or lighter',
 'milkshake (soy and almond) added ']

new_items = []
for string in items:
    item = slashize_conjuctions(string)
    lst = expand_by_conjuction(item)
    lst = [ele.replace("(","").replace(")","") for ele in lst]
    [new_items.append(ele) for ele in lst]
    #print(f'String:{string} ITEM:{item} --> list{lst}')
print(new_items)