我有两个列表,如下所示:
c = ['John', 'query 989877 forcast', 'Tamm']
isl = ['My name is Anne Query 989877', 'John', 'Tamm Ju']
我想检查isl
中c
中每个项目的每个项目,以便我获得所有部分字符串匹配。
我需要的输出如下所示:
out = ["john", "query 989877", "tamm"]
可以看出,我也得到了部分字符串匹配。
我试过以下内容:
out = []
for word in c:
for w in isl:
if word.lower() in w.lower():
out.append(word)
但这只是输出为
out = ["John", "Tamm"]
我也尝试了以下内容:
print [word for word in c if word.lower() in (e.lower() for e in isl)]
但这仅仅输出" John"。 我如何得到我想要的东西?
答案 0 :(得分:3)
也许是这样的:
def get_sub_strings(s):
words = s.split()
for i in xrange(1, len(words)+1): #reverse the order here
for n in xrange(0, len(words)+1-i):
yield ' '.join(words[n:n+i])
...
>>> out = []
>>> for word in c:
for sub in get_sub_strings(word.lower()):
for s in isl:
if sub in s.lower():
out.append(sub)
...
>>> out
['john', 'query', '989877', 'query 989877', 'tamm']
如果您只想存储最大匹配,那么您需要以相反的顺序生成子字符串,并在isl
找到匹配后立即中断:
def get_sub_strings(s):
words = s.split()
for i in xrange(len(words)+1, 0, -1):
for n in xrange(0, len(words)+1-i):
yield ' '.join(words[n:n+i])
out = []
for word in c:
for sub in get_sub_strings(word.lower()):
if any(sub in s.lower() for s in isl):
out.append(sub)
break
print out
#['john', 'query 989877', 'tamm']
答案 1 :(得分:0)
好吧我已经拿出来了!一种非常黑客的方式;我自己不喜欢这个方法,但它给了我输出:
Step1:
in: c1 = []
for r in c:
c1.append(r.split())
out: c1 = [['John'], ['query', '989877', 'forcast'], ['Tamm']]
Step2:
in: p = []
for w in isl:
for word in c1:
for w1 in word:
if w1.lower() in w.lower():
p.append(w1)
out: p = ['query', '989877', 'John', 'Tamm']
Step3:
in: out = []
for word in c:
t = []
for i in p:
if i in word:
t.append(i)
out.append(t)
out: out = [['John'], ['query', '989877'], ['Tamm']]
Step4:
in: out_final = []
for i in out:
out_final.append(" ".join(e for e in i))
out: out_final = ['John', 'query 989877', 'Tamm']