因此,我尝试使用StanfordCore NLP运行文本标记化文本,以使用this git repo进行文本摘要。我已经为Java-8设置了环境变量,并且正在使用python 2.7。当我运行此命令时:
echo "This is text tokenization" | java -cp C:\Users\Harshit\Downloads\stanford-corenlp-full-2016-10-31\stanford-corenlp-full-2016-10-31\stanford-corenlp-3.7.0.jar\ edu.stanford.nlp.process.PTBTokenizer.class
它工作正常,输出为:
“此
是
文本
令牌化”
但是当我使用命令时:
python make_datafiles.py /path/to/cnn/stories /path/to/dailymail/stories.
我收到此错误:
'"java -cp"' is not recognized as an internal or external command,
operable program or batch file.
Exception: The tokenized stories directory cnn_stories_tokenized contains 0 files, but it should contain the same number as C:\Users\Harshit\Downloads\cnn_stories_tokenized\cnn_stories_tokenized (which has 92579 files). Was there an error during tokenization?
我该如何解决并标记数据文件?
答案 0 :(得分:0)
能否请您检查Java路径是否正确配置?
检查Java路径的步骤: