当尝试在长文本输入中查找实体时,Google Cloud的自然语言程序会将单词分组在一起,然后获取其不正确的实体。这是我的程序:
def entity_recognizer(nouns):
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/Users/superaitor/Downloads/link"
text = ""
for words in nouns:
text += words + " "
client = language.LanguageServiceClient()
if isinstance(text, six.binary_type):
text = text.decode('utf-8')
document = types.Document(
content=text.encode('utf-8'),
type=enums.Document.Type.PLAIN_TEXT)
encoding = enums.EncodingType.UTF32
if sys.maxunicode == 65535:
encoding = enums.EncodingType.UTF16
entity = client.analyze_entities(document, encoding).entities
entity_type = ('UNKNOWN', 'PERSON', 'LOCATION', 'ORGANIZATION',
'EVENT', 'WORK_OF_ART', 'CONSUMER_GOOD', 'OTHER')
for entity in entity:
#if entity_type[entity.type] is "PERSON":
print(entity_type[entity.type])
print(entity.name)
这里的名词是单词的列表。然后,我将其转换为字符串(我已经尝试了多种方法,都给出了相同的结果),但是程序吐出的输出如下:
PERSON
liberty secularism etching domain professor lecturer tutor royalty
government adviser commissioner
OTHER
business view society economy
OTHER
business
OTHER
verge industrialization market system custom shift rationality
OTHER
family kingdom life drunkenness college student appearance income family
brink poverty life writer variety attitude capitalism age process
production factory system
有关如何解决此问题的任何意见?
答案 0 :(得分:0)
我将直接使用Google默认类别,而不是根据实体进行分类
entity = client.analyze_entities(document, encoding).entities
到
categories = client.classify_text(document).categories
,并因此更新代码。我基于this tutorial编写了以下示例代码,并进一步开发了in github。
def run_quickstart():
# [START language_quickstart]
# Imports the Google Cloud client library
# [START migration_import]
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types
# [END migration_import]
# Instantiates a client
# [START migration_client]
client = language.LanguageServiceClient()
# [END migration_client]
# The text to analyze
text = u'For its part, India has said it will raise taxes on 29 products imported from the US - including some agricultural goods, steel and iron products - in retaliation for the wide-ranging US tariffs.'
document = types.Document(
content=text,
type=enums.Document.Type.PLAIN_TEXT)
# Detects the sentiment of the text
sentiment = client.analyze_sentiment(document=document).document_sentiment
# Classify content categories
categories = client.classify_text(document).categories
# User category feedback
for category in categories:
print(u'=' * 20)
print(u'{:<16}: {}'.format('name', category.name))
print(u'{:<16}: {}'.format('confidence', category.confidence))
# User sentiment feedback
print('Text: {}'.format(text))
print('Sentiment: {}, {}'.format(sentiment.score, sentiment.magnitude))
# [END language_quickstart]
if __name__ == '__main__':
run_quickstart()
此解决方案对您有用吗?如果没有,为什么?
答案 1 :(得分:0)
对于analyze entities in a text,您可以使用文档中的示例,如下所示:
import argparse
import sys
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types
import six
def entities_text(text):
"""Detects entities in the text."""
client = language.LanguageServiceClient()
if isinstance(text, six.binary_type):
text = text.decode('utf-8')
# Instantiates a plain text document.
document = types.Document(
content=text,
type=enums.Document.Type.PLAIN_TEXT)
# Detects entities in the document. You can also analyze HTML with:
# document.type == enums.Document.Type.HTML
entities = client.analyze_entities(document).entities
# entity types from enums.Entity.Type
entity_type = ('UNKNOWN', 'PERSON', 'LOCATION', 'ORGANIZATION',
'EVENT', 'WORK_OF_ART', 'CONSUMER_GOOD', 'OTHER')
for entity in entities:
print('=' * 20)
print(u'{:<16}: {}'.format('name', entity.name))
print(u'{:<16}: {}'.format('type', entity_type[entity.type]))
print(u'{:<16}: {}'.format('metadata', entity.metadata))
print(u'{:<16}: {}'.format('salience', entity.salience))
print(u'{:<16}: {}'.format('wikipedia_url',
entity.metadata.get('wikipedia_url', '-')))
entities_text("Donald Trump is president of United States of America")
此示例的输出是:
====================
name : Donald Trump
type : PERSON
metadata : <google.protobuf.pyext._message.ScalarMapContainer object at 0x7fd9d0125170>
salience : 0.9564903974533081
wikipedia_url : https://en.wikipedia.org/wiki/Donald_Trump
====================
name : United States of America
type : LOCATION
metadata : <google.protobuf.pyext._message.ScalarMapContainer object at 0x7fd9d01252b0>
salience : 0.04350961744785309
wikipedia_url : https://en.wikipedia.org/wiki/United_States
在本例中可以看到,实体分析检查给定文本中是否存在已知实体(诸如公共人物,地标等专有名词)。它不会为您提供文本中每个单词的实体。