我是docker的新手,我正在尝试在docker上安装一些nltk软件包 这是我的泊坞文件
FROM python:3-onbuild
RUN python -m libs.py
COPY start.sh /libs.py
COPY start.sh /start.sh
EXPOSE 8000
CMD ["/start.sh"]
这是我的libs.py,其中包含要下载的nltk包
import nltk
nltk.data.path.append('./')
nltk.download('wordnet')
nltk.download('pros_cons')
nltk.download('snowball_data')
nltk.download('averaged_perceptron_tagger')
nltk.download('averaged_perceptron_tagger_ru')
nltk.download('punkt')
nltk.download('universal_tagset')
nltk.download('maxent_treebank_pos_tagger')
nltk.download('hmm_treebank_pos_tagger')
nltk.download('reuters')
nltk.download('treebank')
nltk.download('vader_lexicon')
nltk.download('porter_test')
nltk.download('rslp')
Docker Image已成功创建,但是当我尝试使用这些包时,它会抛出错误
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('punkt')
Searched in:
- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/usr/local/nltk_data'
- '/usr/local/lib/nltk_data'
- ''
**********************************************************************
有人可以告诉为什么没有安装nltk软件包吗?感谢
答案 0 :(得分:0)
看起来你必须在Docker中创建一个用户。你应该尽量避免在Docker中成为root用户(默认情况下)。
然而,您可以设置download_dir
when using nltk.download()
:
下载(个体, info_or_id =无, 的 download_dir =无下, 安静=假 力= FALSE, prefix ='[nltk_data]', halt_on_error =真, raise_on_error =假):
如果没有为download_dir
设置值,it will try to save it the default path:
# decide where we're going to save things to.
if self._download_dir is None:
self._download_dir = self.default_download_dir()
更具体地说:https://github.com/nltk/nltk/blob/develop/nltk/downloader.py#L919
def default_download_dir(self):
"""
Return the directory to which packages will be downloaded by
default. This value can be overridden using the constructor,
or on a case-by-case basis using the ``download_dir`` argument when
calling ``download()``.
On Windows, the default download directory is
``PYTHONHOME/lib/nltk``, where *PYTHONHOME* is the
directory containing Python, e.g. ``C:\\Python25``.
On all other platforms, the default directory is the first of
the following which exists or which can be created with write
permission: ``/usr/share/nltk_data``, ``/usr/local/share/nltk_data``,
``/usr/lib/nltk_data``, ``/usr/local/lib/nltk_data``, ``~/nltk_data``.
"""
因此它将文件保存在/root/nltk_data/
当您运行/
泊坞窗图片时,您似乎正在访问CMD ["/start.sh"]
目录,因此您可能拥有/root/nltk_data
的一些权限设置。
明确设置要下载nltk_data
目录的路径:
nltk.download('popular', download_dir='/path/to/nltk_data/')
运行新的python实例时,
nltk.data.path.append('/path/to/nltk_data/')
答案 1 :(得分:0)
您必须在nltk.data.path.append('/path/to/nltk_data')
文件中设置settings.py
并且程序相同
libs.py
包含所有包装细节
之后将其添加到您的泊坞窗文件
中RUN pip install nltk
RUN python nltk_pkg.py
COPY start.sh /nltk_pkg.py
COPY start.sh /start.sh
它对我有用。