Stanford NLP / Petrarch:放弃判刑

时间:2015-11-01 21:49:33

标签: python nlp stanford-nlp

我正在寻找用 Petrarch 解析一系列新闻故事。根据其official document

  

PETRARCH的主要输入格式是每个条目的XML文档   在文档中要解析的句子或故事。输入可以是   个别句子或整个故事。另外,输入   可以包含来自StanfordNLP的预解析信息或仅包含普通信息   斯坦福大学解析的文字留给了TABARI。是否输入   使用PETRARCH中的-P标志指示是否解析   命令行参数。

换句话说,Petrarch使用StanfordNLP作为其解析工具的一部分。

我的新闻文档都在一个没有XML结构的txt文件中(因此,没有句子属性,id,但有日期)。但我想尝试使用示例文本来查看是否有效,在这种情况下,我会将这些文本重新编程为相应的格式。以下是样本:

<document>
<Sentences>

<Sentence sentence = "Boolean" id = "1" date = "20151026">
    <Text>China, Japan and South Korea will hold a summit in South Korea when Chinese Premier Li Keqiang visits.</Text> 
</Sentence>
<Sentence sentence = "Boolean" id = "2" date = "20151027">
    <Text>It is the first China-Japan-South Korea meeting since they were discontinued in 2012 amid tension dating back to World War Two.</Text>    
</Sentence>
<Sentence sentence = "Boolean" id = "3" date = "20151027">
    <Text>Marry has a happy life.</Text>    
</Sentence>

</Sentences>
</document>

Petrarch接受格式,程序运行没有错误,但没有输出。下面是我的python代码:

cd 
virtualenv venv
source venv/bin/activate
petrarch parse -i reuter1025.xml -o output.txt

以下是我从终端复制的日志:

(venv)d-172-26-7-114:~ Carl$ petrarch parse -i reuter1025.xml -o output.txt

new_actor_length = 0
stop_on_error = False
write_actor_root = False
write_actor_text = False
require_dyad = True
code-by-sentence True
pause_by_sentence False
pause_by_story False
Comma-delimited clause elimination:
Initial : deactivated
Internal: min = 2    max = 8
Terminal: min = 2    max = 8
Verb dictionary: CAMEO.verbpatterns.150430.txt
Actor dictionaries: [u'Phoenix.Countries.actors.txt', u'Phoenix.International.actors.txt', u'Phoenix.MilNonState.actors.txt']
Agent dictionary: Phoenix.agents.txt
Discard dictionary: Phoenix.discards.txt
Issues dictionary: Phoenix.IssueCoding.txt

Setting up StanfordNLP. The program isn't dead. Promise.
Stanford setup complete. Starting parse of 3 stories...
Done with StanfordNLP parse...

Discard sentence:   CHINA FIRST 2012
Discard sentence:   CHINA FIRST 2012
Summary:
Stories read: 0    Sentences coded: 0   Events generated: 0
Discards:  Sentence 2   Story 0   Sentences without events: 0
Coding time: 5.003469944
Finished

问题似乎是StanfordNLP无视我的所有句子。对于那些有经验的人,我的原始格式有什么问题吗?我真的很想做这项工作,任何想法都会受到赞赏!

0 个答案:

没有答案