我有一个string
和一个list
,如下所示:
text = 'Sherlock Holmes. PARIS. Address: 221B Baker Street, london. Solving case in Madrid.'
city = ['Paris', 'London', 'Madrid']
我想从出现在单词地址之后的列表中提取第一个元素。
这是我使用nltk
解决问题的方法
import nltk
loc = None
flag = False
for word in nltk.word_tokenize(text):
if word == 'Address':
flag = True
if flag:
if word.capitalize() in city:
loc = word
break
print(loc)
从上面得到的结果是london
。
但是在实际情况下,我的文字太大了,城市列表也太多了,有没有更好的方法呢?
答案 0 :(得分:1)
The lowest hanging fruit I see is that you can turn city
into a set
for constant time membership checks. Besides that, consider using the next
with default argument to return the next city.
city = {'Paris', 'London', 'Madrid'}
while text:
text = text.partition('Address')[-1].strip()
print(
next((w for w in nltk.word_tokenize(text) if w.capitalize() in city), None))