我有一个包含一些句子的文件。我将多语言用于命名实体识别,并将所有检测到的实体存储在列表中。现在,我要检查每个句子中是否存在任何实体或成对实体,请为我显示。
这是我所做的:
from polyglot.text import Text
file = open('input_raw.txt', 'r')
input_file = file.read()
test = Text(input_file, hint_language_code='fa')
list_entity = []
for sent in test.sentences:
#print(sent[:10], "\n")
for entity in test.entities:
list_entity.append(entity)
for i in range(len(test)):
m = test.entities[i]
n = test.words[m.start: m.end] # it shows only word not tag
if str(n).split('.')[-1] in test: # if each entities exist in each sentence
print(n)
输入:
sentence1: Bill Gate is the founder of Microsoft.
sentence2: Trump is the president of USA.
预期输出:
Bill Gate, Microsoft
Trump, USA