我在NLP项目的win8.1 64x上的Azure Jupyter笔记本上使用Python 3.6。我正在尝试安装Stanford NER Tagger。我下载了Java和所有三个组件
1)stanford-ner-2015-12-09, 2)stanford-parser-full-2018-10-17, 3)stanford-postagger-2015-12-09
并将这三个文件上传到我在Azure Jupyter笔记本上的目录中。为了测试斯坦福大学的NER Tagger,我按照https://pythonprogramming.net/named-entity-recognition-stanford-ner-tagger/
的建议运行了以下代码# -*- coding: utf-8 -*-
from nltk.tag import StanfordNERTagger
from nltk.tokenize import word_tokenize
st = StanfordNERTagger('/usr/share/stanford-ner/classifiers/english.all.3class.distsim.crf.ser.gz',
'/usr/share/stanford-ner/stanford-ner.jar',
encoding='utf-8')
text = 'While in France, Christine Lagarde discussed short-term stimulus efforts in a recent interview with the Wall Street Journal.'
tokenized_text = word_tokenize(text)
classified_text = st.tag(tokenized_text)
print(classified_text)
但是我收到以下查找错误,该错误基本上建议在CLASSPATH上设置变量。我之前在Azure Jupyter笔记本上安装了许多软件包,但这是第一次要求设置CLASSPATH环境。我上网浏览了可用的文档,但是很遗憾,到目前为止,没有人针对Azure Jupyter环境解决此问题。感谢您的帮助。
---------------------------------------------------------------------------
LookupError Traceback (most recent call last)
<ipython-input-7-b3696afa5972> in <module>
4 from nltk.tokenize import word_tokenize
5
----> 6 st = StanfordNERTagger('stanford-ner.jar',encoding='utf-8')
7
8 text = 'While in France, Christine Lagarde discussed short-term stimulus efforts in a recent interview with the Wall Street Journal.'
~/anaconda3_501/lib/python3.6/site-packages/nltk/tag/stanford.py in __init__(self, *args, **kwargs)
178
179 def __init__(self, *args, **kwargs):
--> 180 super(StanfordNERTagger, self).__init__(*args, **kwargs)
181
182 @property
~/anaconda3_501/lib/python3.6/site-packages/nltk/tag/stanford.py in __init__(self, model_filename, path_to_jar, encoding, verbose, java_options)
61 self._JAR, path_to_jar,
62 searchpath=(), url=_stanford_url,
---> 63 verbose=verbose)
64
65 self._stanford_model = find_file(model_filename,
~/anaconda3_501/lib/python3.6/site-packages/nltk/__init__.py in find_jar(name_pattern, path_to_jar, env_vars, searchpath, url, verbose, is_regex)
719 searchpath=(), url=None, verbose=False, is_regex=False):
720 return next(find_jar_iter(name_pattern, path_to_jar, env_vars,
--> 721 searchpath, url, verbose, is_regex))
722
723
~/anaconda3_501/lib/python3.6/site-packages/nltk/__init__.py in find_jar_iter(name_pattern, path_to_jar, env_vars, searchpath, url, verbose, is_regex)
714 (name_pattern, url))
715 div = '='*75
--> 716 raise LookupError('\n\n%s\n%s\n%s' % (div, msg, div))
717
718 def find_jar(name_pattern, path_to_jar=None, env_vars=(),
LookupError:
===========================================================================
NLTK was unable to find stanford-ner.jar! Set the CLASSPATH
environment variable.
For more information, on stanford-ner.jar, see:
<https://nlp.stanford.edu/software>