我一直在通过“ Text Analytics with Python”一书学习NLP文本分类。在虚拟环境中需要安装几个模块。我使用Anaconda env。我用Python 3.7创建了一个空白的env并安装了所需的pandas,numpy,nltk,gensim,sklearn ...然后,我必须安装Pattern。第一个问题是由于Pattern和mkl_random之间的冲突,我无法通过conda安装Pattern。
(nlp) D:\Python\Text_classification>conda install -c mickc pattern
Solving environment: failed
UnsatisfiableError: The following specifications were found to be in conflict:
- mkl_random
- pattern
Use "conda info <package>" to see the dependencies for each package.
不可能删除mkl_random,因为有相关的软件包:gensim,numpy,scikit-learn等。我不知道该怎么办,我没有找到适合我的Pattern的合适的conda安装。 。然后,我使用pip安装了Pattern。安装成功。可以同时使用conda和pip的软件包吗?
我认为第二个问题与第一个问题有关。我从https://github.com/dipanjanS/text-analytics-with-python/tree/master/Old-First-Edition/source_code/Ch04_Text_Classification下载了本书的示例代码,在Python 2.x的'print'函数中添加了括号,并运行category.py 该程序引发了一个异常:
Traceback (most recent call last):
File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 609, in _read
raise StopIteration
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "classification.py", line 50, in <module>
norm_train_corpus = normalize_corpus(train_corpus)
File "D:\Python\Text_classification\normalization.py", line 96, in normalize_corpus
text = lemmatize_text(text)
File "D:\Python\Text_classification\normalization.py", line 67, in lemmatize_text
pos_tagged_text = pos_tag_text(text)
File "D:\Python\Text_classification\normalization.py", line 58, in pos_tag_text
tagged_text = tag(text)
File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\en\__init__.py", line 188, in tag
for sentence in parse(s, tokenize, True, False, False, False, encoding, **kwargs).split():
File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\en\__init__.py", line 169, in parse
return parser.parse(s, *args, **kwargs)
File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 1172, in parse
s[i] = self.find_tags(s[i], **kwargs)
File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\en\__init__.py", line 114, in find_tags
return _Parser.find_tags(self, tokens, **kwargs)
File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 1113, in find_tags
lexicon = kwargs.get("lexicon", self.lexicon or {}),
File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 376, in __len__
return self._lazy("__len__")
File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 368, in _lazy
self.load()
File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 625, in load
dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if len(x.split(" ")) > 1))
File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 625, in <genexpr>
dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if len(x.split(" ")) > 1))
RuntimeError: generator raised StopIteration
我不知道发生了什么。是由于我使用pip进行安装而引发异常,还是问题出在书中错误或不赞成使用的代码...并且可以将Pattern与其他所有必需的软件包一起安装在conda中。
提前谢谢!
答案 0 :(得分:0)
切换到Python 3.6为我解决了这个问题。
如果您使用的是conda,请首先设置一个环境,并指定要使用3.6,然后在其中安装所需的任何软件包。
conda create --name myenv python=3.6 pandas numpy gensim jupyter
conda activate myenv
由于某种原因,我不需要直接安装Pattern。
相关的Gensim说明:https://github.com/RaRe-Technologies/gensim/issues/2438