Question

因此，我尝试使用StanfordCore NLP运行文本标记化文本，以使用this git repo进行文本摘要。我已经为Java-8设置了环境变量，并且正在使用python 2.7。当我运行此命令时：

echo "This is text tokenization" | java -cp C:\Users\Harshit\Downloads\stanford-corenlp-full-2016-10-31\stanford-corenlp-full-2016-10-31\stanford-corenlp-3.7.0.jar\ edu.stanford.nlp.process.PTBTokenizer.class

它工作正常，输出为：

“此

是

文本

令牌化”

但是当我使用命令时：

python make_datafiles.py /path/to/cnn/stories /path/to/dailymail/stories.

我收到此错误：

'"java -cp"' is not recognized as an internal or external command,
operable program or batch file.
Exception: The tokenized stories directory cnn_stories_tokenized contains 0 files, but it should contain the same number as C:\Users\Harshit\Downloads\cnn_stories_tokenized\cnn_stories_tokenized (which has 92579 files). Was there an error during tokenization?

我该如何解决并标记数据文件？

Answer 1

能否请您检查Java路径是否正确配置？

检查Java路径的步骤：

转到cmd。
java -version
java版本，例如“ java version 1.x.xxx”
如果没有，请配置Java路径。您可以从以下链接获取帮助以配置Java路径 Environment variables for java installation

Stanford-Core-NLP为文本标记化提供Java错误

1 个答案: