我正在使用stanford核心NLP,我使用这一行来加载一些模块来处理我的文本:
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
我可以加载一个模块来分块文本吗?
或者是否有任何使用stanford核心来改变某些文本的替代方法的建议?
谢谢
答案 0 :(得分:5)
我认为解析器输出可用于获取NP块。看一下提供示例输出的Stanford Parser website上的无上下文表示。
答案 1 :(得分:5)
要使用Stanford NLP分块,您可以使用以下软件包:
答案 2 :(得分:0)
您需要的是CoreNLP中选区解析的输出,该输出可为您提供块的信息,例如动词短语(VPs),名词短语(NPs)等。据我所知,CoreNLP中没有方法可以为您提供块列表。这意味着您必须解析选区解析的实际输出以提取块。
例如,这是CoreNLP选区分析器对一个例句的输出:
(ROOT (S ("" "") (NP (NNP Anarchism)) (VP (VBZ is) (NP (NP (DT a) (JJ political) (NN philosophy)) (SBAR (WHNP (WDT that)) (S (VP (VBZ advocates) (NP (NP (JJ self-governed) (NNS societies)) (VP (VBN based) (PP (IN on) (NP (JJ voluntary) (, ,) (JJ cooperative) (NNS institutions))))))))) (, ,) (S (VP (VBG rejecting) (NP (JJ unjust) (NN hierarchy))))) (. .)))
如您所见,字符串中包含NP和VP标签,现在您必须通过解析此字符串来提取块的实际文本。让我知道您是否可以找到一种方法来提供块列表?!
答案 3 :(得分:0)
扩展 Pedram 的答案,可以使用以下代码:
from nltk.parse.corenlp import CoreNLPParser
nlp = CoreNLPParser('http://localhost:9000') # Assuming CoreNLP server is running locally at port 9000
def extract_phrase(trees, labels):
phrases = []
for tree in trees:
for subtree in tree.subtrees():
if subtree.label() in labels:
t = subtree
t = ' '.join(t.leaves())
phrases.append(t)
return phrases
def get_chunks(sentence):
trees = next(nlp.raw_parse(sentence))
nps = extract_phrase(trees, ['NP', 'CC'])
vps = extract_phrase(trees, ['VP'])
return trees, nps, vps
if __name__ == '__main__':
dialog = [
"Anarchism is a political philosophy that advocates self-governed societies based on voluntary cooperative institutions rejecting unjust hierarchy"
]
for sentence in dialog:
trees, nps, vps = get_chunks(sentence)
print("\n\n")
print("Sentence: ", sentence)
print("Tree:\n", trees)
print("Noun Phrases: ", nps)
print("Verb Phrases: ", vps)
"""
Sentence: Anarchism is a political philosophy that advocates self-governed societies based on voluntary cooperative institutions rejecting unjust hierarchy
Tree:
(ROOT
(S
(NP (NN Anarchism))
(VP
(VBZ is)
(NP
(NP (DT a) (JJ political) (NN philosophy))
(SBAR
(WHNP (WDT that))
(S
(VP
(VBZ advocates)
(NP
(ADJP (NN self) (HYPH -) (VBN governed))
(NNS societies))
(PP
(VBN based)
(PP
(IN on)
(NP
(NP
(JJ voluntary)
(JJ cooperative)
(NNS institutions))
(VP
(VBG rejecting)
(NP (JJ unjust) (NN hierarchy)))))))))))))
Noun Phrases: ['Anarchism', 'a political philosophy that advocates self - governed societies based on voluntary cooperative institutions rejecting unjust hierarchy', 'a political philosophy', 'self - governed societies', 'voluntary cooperative institutions rejecting unjust hierarchy', 'voluntary cooperative institutions', 'unjust hierarchy']
Verb Phrases: ['is a political philosophy that advocates self - governed societies based on voluntary cooperative institutions rejecting unjust hierarchy', 'advocates self - governed societies based on voluntary cooperative institutions rejecting unjust hierarchy', 'rejecting unjust hierarchy']
"""