我正在尝试做回指解决,下面就是我的代码。
首先我导航到我下载stanford模块的文件夹。然后我在命令提示符下运行命令来初始化stanford nlp模块
java -mx4g -cp "*;stanford-corenlp-full-2017-06-09/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
之后我在Python中执行以下代码
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
我想将句子Tom is a smart boy. He know a lot of thing.
更改为Tom is a smart boy. Tom know a lot of thing.
,而Python中没有任何教程或任何帮助。
我能做的就是用Python下面的代码注释
共识决议
output = nlp.annotate(sentence, properties={'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})
并解析coref
coreferences = output['corefs']
我低于JSON
coreferences
{u'1': [{u'animacy': u'ANIMATE',
u'endIndex': 2,
u'gender': u'MALE',
u'headIndex': 1,
u'id': 1,
u'isRepresentativeMention': True,
u'number': u'SINGULAR',
u'position': [1, 1],
u'sentNum': 1,
u'startIndex': 1,
u'text': u'Tom',
u'type': u'PROPER'},
{u'animacy': u'ANIMATE',
u'endIndex': 6,
u'gender': u'MALE',
u'headIndex': 5,
u'id': 2,
u'isRepresentativeMention': False,
u'number': u'SINGULAR',
u'position': [1, 2],
u'sentNum': 1,
u'startIndex': 3,
u'text': u'a smart boy',
u'type': u'NOMINAL'},
{u'animacy': u'ANIMATE',
u'endIndex': 2,
u'gender': u'MALE',
u'headIndex': 1,
u'id': 3,
u'isRepresentativeMention': False,
u'number': u'SINGULAR',
u'position': [2, 1],
u'sentNum': 2,
u'startIndex': 1,
u'text': u'He',
u'type': u'PRONOMINAL'}],
u'4': [{u'animacy': u'INANIMATE',
u'endIndex': 7,
u'gender': u'NEUTRAL',
u'headIndex': 4,
u'id': 4,
u'isRepresentativeMention': True,
u'number': u'SINGULAR',
u'position': [2, 2],
u'sentNum': 2,
u'startIndex': 3,
u'text': u'a lot of thing',
u'type': u'NOMINAL'}]}
对此有何帮助?
答案 0 :(得分:2)
我有类似的问题。在尝试处理核心nlp之后,我使用神经coref解决了它。您可以使用以下代码通过神经Coref轻松完成这项工作:
导入空间
nlp = spacy.load('en_coref_md')
doc = nlp(u'电话区号仅在满足以下所有条件时才有效。不能为空。必须为数字。不能小于200。最小位数为3。 ')
打印(doc ._。coref_clusters)
打印(doc ._。coref_resolved)
以上代码的输出为: [电话区号:[电话区号,电话号码,电话号码]]
电话区号仅在满足以下所有条件时才有效。电话区号不能为空。电话区号应为数字。电话区号不能小于200。最小位数应为3。
为此,您将需要使用sp_acy,以及英语模型,例如en_coref_md或en_coref_lg或en_coref_sm。您可以参考以下链接以获得更好的解释:
答案 1 :(得分:2)
这是使用CoreNLP输出的数据结构的一种可能的解决方案。提供所有信息。这并不是一个完整的解决方案,可能需要扩展才能处理所有情况,但这是一个很好的起点。
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
def resolve(corenlp_output):
""" Transfer the word form of the antecedent to its associated pronominal anaphor(s) """
for coref in corenlp_output['corefs']:
mentions = corenlp_output['corefs'][coref]
antecedent = mentions[0] # the antecedent is the first mention in the coreference chain
for j in range(1, len(mentions)):
mention = mentions[j]
if mention['type'] == 'PRONOMINAL':
# get the attributes of the target mention in the corresponding sentence
target_sentence = mention['sentNum']
target_token = mention['startIndex'] - 1
# transfer the antecedent's word form to the appropriate token in the sentence
corenlp_output['sentences'][target_sentence - 1]['tokens'][target_token]['word'] = antecedent['text']
def print_resolved(corenlp_output):
""" Print the "resolved" output """
possessives = ['hers', 'his', 'their', 'theirs']
for sentence in corenlp_output['sentences']:
for token in sentence['tokens']:
output_word = token['word']
# check lemmas as well as tags for possessive pronouns in case of tagging errors
if token['lemma'] in possessives or token['pos'] == 'PRP$':
output_word += "'s" # add the possessive morpheme
output_word += token['after']
print(output_word, end='')
text = "Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but " \
"hers is blue. It is older than hers. The big cat ate its dinner."
output = nlp.annotate(text, properties= {'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})
resolve(output)
print('Original:', text)
print('Resolved: ', end='')
print_resolved(output)
这将提供以下输出:
Original: Tom and Jane are good friends. They are cool. He knows a lot of things and so does she. His car is red, but hers is blue. It is older than hers. The big cat ate his dinner.
Resolved: Tom and Jane are good friends. Tom and Jane are cool. Tom knows a lot of things and so does Jane. Tom's car is red, but Jane's is blue. His car is older than Jane's. The big cat ate The big cat's dinner.
如您所见,当代词具有句子首字母(标题大小写)的先行词(在最后一句中用“大猫”代替“大猫”)时,此解决方案不涉及更正情况。这取决于先行词的类别-普通名词先词需要小写,而专有名词先词则不需要。
其他一些临时处理可能是必要的(关于我测试语句中的所有格)。它还假定您不希望重复使用原始输出令牌,因为它们已被此代码修改。一种解决方法是复制原始数据结构或创建新属性,并相应地更改print_resolved
函数。
纠正任何分辨率错误也是另一个挑战!
答案 2 :(得分:0)
from stanfordnlp.server import CoreNLPClient
from nltk import tokenize
client = CoreNLPClient(annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner', 'parse', 'coref'], memory='4G', endpoint='http://localhost:9001')
def pronoun_resolution(text):
ann = client.annotate(text)
modified_text = tokenize.sent_tokenize(text)
for coref in ann.corefChain:
antecedent = []
for mention in coref.mention:
phrase = []
for i in range(mention.beginIndex, mention.endIndex):
phrase.append(ann.sentence[mention.sentenceIndex].token[i].word)
if antecedent == []:
antecedent = ' '.join(word for word in phrase)
else:
anaphor = ' '.join(word for word in phrase)
modified_text[mention.sentenceIndex] = modified_text[mention.sentenceIndex].replace(anaphor, antecedent)
modified_text = ' '.join(modified_text)
return modified_text
text = 'Tom is a smart boy. He knows a lot of things.'
pronoun_resolution(text)
输出:“汤姆是个聪明的男孩。汤姆很了解。”