我正在尝试训练我自己的关系提取模型,如here所述,但不断出现一个奇怪的错误。
我的属性文件:
$key = $db->getAll("SELECT `key`
FROM `".PREFIX."user_stats`
WHERE `article_id` = ?i
AND `domain` = ?s
AND `userid` = ?i",
$article_id, $domain,$userid);
这是我在终端中运行的内容:
#Below are some basic options. See edu.stanford.nlp.ie.machinereading.MachineReadingProperties class for more options.
# Pipeline options
annotators = pos, lemma, parse
parse.maxlen = 100
# MachineReading properties. You need one class to read the dataset into correct format. See edu.stanford.nlp.ie.machinereading.domains.ace.AceReader for another example.
datasetReaderClass = edu.stanford.nlp.ie.machinereading.domains.roth.RothCONLL04Reader
readerLogLevel = INFO
#Data directory for training. The datasetReaderClass reads data from this path and makes corresponding sentences and annotations.
trainPath = ../re-training-data.corp
#Whether to crossValidate, that is evaluate, or just train.
crossValidate = false
kfold = 10
#Change this to true if you want to use CoreNLP pipeline generated NER tags. The default model generated with the relation extractor release uses the CoreNLP pipeline provided tags (option set to true$
trainUsePipelineNER=true
# where to save training sentences. uses the file if it exists, otherwise creates it.
serializedTrainingSentencesPath = tmp/roth_sentences.ser
serializedEntityExtractorPath = tmp/roth_entity_model.ser
# where to store the output of the extractor (sentence objects with relations generated by the model). This is what you will use as the model when using 'relation' annotator in the CoreNLP pipeline.
serializedRelationExtractorPath = tmp/kpl-relation-model-pipeline.ser
# uncomment to load a serialized model instead of retraining
# loadModel = true
#relationResultsPrinters = edu.stanford.nlp.ie.machinereading.RelationExtractorResultsPrinter,edu.stanford.nlp.ie.machinereading.domains.roth.RothResultsByRelation. For printing output of the model.
relationResultsPrinters = edu.stanford.nlp.ie.machinereading.RelationExtractorResultsPrinter
#In this domain, this is trivial since all the entities are given (or set using CoreNLP NER tagger).
entityClassifier = edu.stanford.nlp.ie.machinereading.domains.roth.RothEntityExtractor
extractRelations = true
extractEvents = false
#We are setting the entities beforehand so the model does not learn how to extract entities etc.
extractEntities = false
#Opposite of crossValidate.
trainOnly=true
# The set chosen by feature selection using RothCONLL04:
relationFeatures = arg_words,arg_type,dependency_path_lowlevel,dependency_path_words,surface_path_POS,entities_between_args,full_tree_path
结果:
sudo java -cp stanford-corenlp-3.7.0.jar:stanford-corenlp-3.7.0-models.jar edu.stanford.nlp.ie.machinereading.MachineReading --arguments kpl-re-model.properties
该错误表明它无法找到'tmp / roth_sentences.ser',但它没有意义,因为它应该创建该文件。
有什么想法吗?
谢谢! 西蒙。
答案 0 :(得分:1)
我认为如果您将tmp/roth_sentences.ser
更改为roth_sentences.ser
则应该有效。我猜测问题是/home/ubuntu/stanford-corenlp-full-2016-10-31/tmp
不存在,所以当它试图写文件时它会崩溃。