我有一个包含一些句子的文件。我将多语言用于命名实体识别,并将所有检测到的实体存储在列表中。现在,我要检查每个句子中是否存在任何实体或成对实体,请为我显示。
这是我所做的:
from polyglot.text import Text
file = open('input_raw.txt', 'r')
input_file = file.read()
test = Text(input_file, hint_language_code='fa')
list_entity = []
for sent in test.sentences:
#print(sent[:10], "\n")
for entity in test.entities:
list_entity.append(entity)
for i in range(len(test)):
m = test.entities[i]
n = test.words[m.start: m.end] # it shows only word not tag
if str(n).split('.')[-1] in test: # if each entities exist in each sentence
print(n)
它给了我一个空白列表。
输入:
sentence1: Bill Gate is the founder of Microsoft.
sentence2: Trump is the president of USA.
预期输出:
Bill Gate, Microsoft
Trump, USA
list_entity的输出:
I-PER(['Trump']), I-LOC(['USA'])
如何检查I-PER(['Trump'])
,I-LOC(['USA'])
是否在第一句中?
答案 0 :(得分:1)
对于初学者,您要将整个文本文件输入添加到实体列表中。
entities
只能由多语对象中的每个句子调用。
from polyglot.text import Text
file = open('input_raw.txt', 'r')
input_file = file.read()
file = Text(input_file, hint_language_code='fa')
list_entity = []
for sentence in file.sentences:
for entity in sentence.entities:
#print(entity)
list_entity.append(entity)
print(list_entity)
现在您没有空列表。
关于您识别身份字词的问题,
我还没有找到一种手动生成实体的方法,因此下面仅检查是否存在具有相同术语的实体。块内部可以有多个字符串,因此我们可以迭代地遍历它们。
from polyglot.text import Text
file = open('input_raw.txt', 'r')
input_file = file.read()
file = Text(input_file, hint_language_code='ar')
def check_sentence(entities_list, sentence): ## Check if string terms
for term in entities_list: ## are in any of the entities
## Compare each Chunk in the list to each Chunk
## object in the sentence and see if there's any matches.
if any(any(entityTerm == term for entityTerm in entityObject)
for entityObject in sentence.entities):
pass
else:
return False
return True
sentence_number = 1 # Which sentence to check
sentence = file.sentences[sentence_number]
entity_terms = ["Bill",
"Gates"]
if check_sentence(entity_terms, sentence):
print("Entity Terms " + str(entity_terms) +
" are in the sentence. '" + str(sentence)+ "'")
else:
print("Sentence '" + str(sentence) +
"' doesn't contain terms" + str(entity_terms ))
一旦找到一种生成任意实体的方法,您要做的就是停止从句子检查器中弹出该术语,以便您也可以进行类型比较。
如果您只想将文件中的实体列表与特定句子进行匹配,则可以做到这一点:
from polyglot.text import Text
file = open('input_raw.txt', 'r')
input_file = file.read()
file = Text(input_file, hint_language_code='fa')
def return_match(entities_list, sentence): ## Check if and which chunks
matches = [] ## are in the sentence
for term in entities_list:
## Check each list in each Chunk object
## and see if there's any matches.
for entity in sentence.entities:
if entity == term:
for word in entity:
matches.append(word)
return matches
def return_list_of_entities(file):
list_entity = []
for sentence in file.sentences:
for entity in sentence.entities:
list_entity.append(entity)
return list_entity
list_entity = return_list_of_entities(file)
sentence_number = 1 # Which sentence to check
sentence = file.sentences[sentence_number]
match = return_match(list_entity, sentence)
if match:
print("Entity Term " + str(match) +
" is in the sentence. '" + str(sentence)+ "'")
else:
print("Sentence '" + str(sentence) +
"' doesn't contain any of the terms" + str(list_entity))