Question

我试图仅从文本中提取城市名称，所以我在Python中使用geograpy库，但是在输出中，还提取了一些其他名称。这是我的代码：

from geograpy.extraction import Extractor
text6 = u"""Some text..."""
e6 = Extractor(text=text6)
e6.find_entities()
print(e6.places)

输入文本：-

反对党领袖马欣达·拉贾帕克萨（Mahinda Rajapaksa）说，整个公众政府因宪法委员会的选举而崩溃任意动作。反对党领袖如此回应查询召开会议后的日记...

输出

['Opposition', 'Leader Mahinda Rajapaksa', 'Opposition Leader']

此文本中没有任何城市名称，因此输出句柄为空

Answer 1

作为geograpy3的提交者来重现您的问题，我在最新的geograpy3 https://github.com/somnathrakshit/geograpy3/blob/master/tests/test_extractor.py中添加了一个测试：并添加了问题：

https://github.com/somnathrakshit/geograpy3/issues/3 固定为： this commit

现在的结果是：

[]

根据要求

 def testStackoverflow54712198(self):
        '''
        see https://stackoverflow.com/questions/54712198/not-only-extracting-places-from-a-text-but-also-other-names-in-geograpypython
        '''
        text='''Opposition Leader Mahinda Rajapaksa says that the whole public administration has collapsed due to the constitution council’s arbitrary actions. The Opposition Leader said so in response to a query a journalised raised after a meeting held...'''
        e=Extractor(text)
        places=e.find_geoEntities()
        if self.debug:
            print(places)
        self.assertEqual([],places)

Answer 2

通过阅读its source code，可以发现places是类Extractor的一个属性，并且稍后在方法find_entities中对其进行了设置。在方法find_entities中，它找出类型为'GPE'，'PERSON'或'ORGANIZATION'的命名实体，并将其填充到places中。

这些命名实体类型在nltk book的“ 5命名实体识别”部分中定义：

“ GPE”：地缘政治实体，例如城市，州/省和国家/地区

因此您可以看到find_entities与城市名称无关。

要从一段文字中查找城市名称，请遵循the example here。

不仅从文本中提取位置，还从geograpy（Python）中提取其他名称

2 个答案: