Question

我有一个string和一个list，如下所示：

text = 'Sherlock Holmes. PARIS. Address: 221B Baker Street, london. Solving case in Madrid.'

city = ['Paris', 'London', 'Madrid']

我想从出现在单词地址之后的列表中提取第一个元素。

这是我使用nltk解决问题的方法

import nltk

loc = None
flag = False
for word in nltk.word_tokenize(text):
    if word == 'Address':
        flag = True

    if flag:
        if word.capitalize() in city:
            loc = word
            break

print(loc)

从上面得到的结果是london。

但是在实际情况下，我的文字太大了，城市列表也太多了，有没有更好的方法呢？

Answer 1

The lowest hanging fruit I see is that you can turn city into a set for constant time membership checks. Besides that, consider using the next with default argument to return the next city.

city = {'Paris', 'London', 'Madrid'}
while text:
    text = text.partition('Address')[-1].strip()
    print(
        next((w for w in nltk.word_tokenize(text) if w.capitalize() in city), None))

从列表中提取出现在特定单词之后的第一个元素

1 个答案: