Question

说我有一个列表和一个字符串：

l=['hello my name is michael',
'hello michael is my name',
'hello michaela is my name',
'hello my name is michelle',
'hello i'm Michael',
'hello my lastname is michael',
'hello michael',
'hello my name is michael brown']

s="hello my name is michael"

在内部，我想搜索字符串中的每个单词，并计算该字符串中每个单词出现在每个列表元素中的次数。

hello my name is michael: 5
hello michael is my name: 5 (all words are present)
hello michaela is my name: 5 (extra characters at end of word are Ok)
hello my name is michelle: 4 
hello i'm Michael: 2 
hello my lastname is michael: 4 (extra characters are end of word are not Ok) 
hello michael: 2
hello my name is michael brown: 5

最后，我希望首先以计数最高的项的顺序返回所有匹配项。因此输出为：

hello my name is michael: 5
hello michael is my name: 5
hello michaela is my name: 5
hello my name is michael brown: 5
hello my name is michelle: 4 
hello my lastname is michael: 4
hello i'm Michael: 2 
hello michael: 2

这本质上是一个正则表达式的匹配和排序问题，但我对此不屑一顾。有关如何继续执行任何或所有步骤的建议？

Answer 1

我不明白您的预期输出。您是这样说的吗？

import re

l = ['hello my name is michael',
    'hello michael is my names',
    'hello michaela is my name',
    'hello my name is michelle',
    'hello i am Michael',
    'hello my lastname is michael',
    'hello michael',
    'hello my name is michael brown']

s = "Hello my name is Michael"

s = s.lower().split()
for item in l:
    d = item.lower().split()
    count = 0
    for ss in s:
        try:
            if ss in d or re.search(ss+"\w+",item.lower()).group() in d:
                count += 1
        except:
            pass
    print (item, count)

检查字符串中连续单词的首字母是否匹配另一个字符串的首字母缩写

1 个答案: