Python:在句子列表中找到常见的句子片段

时间:2018-09-13 10:12:38

标签: python list set-intersection

说我有一个像这样的句子列表:

quick brown blah red work word
quick brown too red blah someone
quick gray one two three
quick gray two three four
quick gray johnson week summer
quick gray johnson day week water fall
quick gray wicked stopper fall
quick gray hotel flamer walk
doggie bone
doggie python
doggie python tree flower python
doggie python flower whatever
tree bone stick

我正在寻找代码以返回常见的“父母”句子列表:

quick brown
quick gray
quick gray johnson
doggie bone
doggie python
tree bone stick

THX

2 个答案:

答案 0 :(得分:0)

您在这里:

def removeNumbers(data):
    result = []
    for sent in data:
        temp = []
        words = sent.split()
        for word in words:
            try:
                number = int(word)
                break
            except:
                temp.append(word)
        result.append(" ".join(temp))
    return result
data = [
    'quick brown 580 650 040 050',
    'quick brown 650 160 150 500',
    'quick gray 075 060 400',
    'quick gray 087 565 600',
    'quick gray johnson 149 135',
    'quick gray johnson 600 650 070 600',
    'quick gray 565 070 250',
    'quick gray 630 550 400',
    'doggie 256',
    'doggie python',
    'doggie python 350 675 106',
    'doggie python 417 560',
    'tree 196 106'
]
data = removeNumbers(data)
print(list(set(data)))

答案 1 :(得分:0)

您可以使用正则表达式轻松做到这一点:

>>> result=[]
>>> for i in data:
>>>     r = re.search(r'([a-z]+\s*)+', i)
>>>     if r:
>>>         res = r.group(0).strip()
>>>         if res not in result:
>>>             result.append(res.strip())
>>> print(result)
['quick brown', 'quick gray', 'quick gray johnson', 'doggie', 'doggie python', 'tree']