从列表中提取出现在特定单词之后的第一个元素

时间:2018-08-27 05:46:59

标签: python python-3.x pattern-matching nltk

我有一个string和一个list,如下所示:

text = 'Sherlock Holmes. PARIS. Address: 221B Baker Street, london. Solving case in Madrid.'

city = ['Paris', 'London', 'Madrid']

我想从出现在单词地址之后的列表中提取第一个元素。

这是我使用nltk解决问题的方法

import nltk

loc = None
flag = False
for word in nltk.word_tokenize(text):
    if word == 'Address':
        flag = True

    if flag:
        if word.capitalize() in city:
            loc = word
            break

print(loc)

从上面得到的结果是london

但是在实际情况下,我的文字太大了,城市列表也太多了,有没有更好的方法呢?

1 个答案:

答案 0 :(得分:1)

The lowest hanging fruit I see is that you can turn city into a set for constant time membership checks. Besides that, consider using the next with default argument to return the next city.

city = {'Paris', 'London', 'Madrid'}
while text:
    text = text.partition('Address')[-1].strip()
    print(
        next((w for w in nltk.word_tokenize(text) if w.capitalize() in city), None))