在Docker上安装nltk

时间:2017-10-31 13:36:46

标签: python docker nltk dockerfile

我是docker的新手,我正在尝试在docker上安装一些nltk软件包 这是我的泊坞文件

FROM python:3-onbuild

RUN python -m libs.py

COPY start.sh /libs.py

COPY start.sh /start.sh

EXPOSE 8000

CMD ["/start.sh"]

这是我的libs.py,其中包含要下载的nltk包

import nltk
nltk.data.path.append('./')
nltk.download('wordnet')
nltk.download('pros_cons')
nltk.download('snowball_data')
nltk.download('averaged_perceptron_tagger')
nltk.download('averaged_perceptron_tagger_ru')
nltk.download('punkt')
nltk.download('universal_tagset')
nltk.download('maxent_treebank_pos_tagger')
nltk.download('hmm_treebank_pos_tagger')
nltk.download('reuters')
nltk.download('treebank')
nltk.download('vader_lexicon')
nltk.download('porter_test')
nltk.download('rslp')

Docker Image已成功创建,但是当我尝试使用这些包时,它会抛出错误

LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/local/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

有人可以告诉为什么没有安装nltk软件包吗?感谢

2 个答案:

答案 0 :(得分:0)

看起来你必须在Docker中创建一个用户。你应该尽量避免在Docker中成为root用户(默认情况下)。

然而,您可以设置download_dir when using nltk.download()

  

下载(个体,            info_or_id =无,            的 download_dir =无下,            安静=假            力= FALSE,            prefix ='[nltk_data]',            halt_on_error =真,            raise_on_error =假):

如果没有为download_dir设置值,it will try to save it the default path

    # decide where we're going to save things to.
    if self._download_dir is None:
        self._download_dir = self.default_download_dir()

更具体地说:https://github.com/nltk/nltk/blob/develop/nltk/downloader.py#L919

def default_download_dir(self):
    """
    Return the directory to which packages will be downloaded by
    default.  This value can be overridden using the constructor,
    or on a case-by-case basis using the ``download_dir`` argument when
    calling ``download()``.
    On Windows, the default download directory is
    ``PYTHONHOME/lib/nltk``, where *PYTHONHOME* is the
    directory containing Python, e.g. ``C:\\Python25``.
    On all other platforms, the default directory is the first of
    the following which exists or which can be created with write
    permission: ``/usr/share/nltk_data``, ``/usr/local/share/nltk_data``,
    ``/usr/lib/nltk_data``, ``/usr/local/lib/nltk_data``, ``~/nltk_data``.
    """

因此它将文件保存在/root/nltk_data/

当您运行/泊坞窗图片时,您似乎正在访问CMD ["/start.sh"]目录,因此您可能拥有/root/nltk_data的一些权限设置。

简而言之

明确设置要下载nltk_data目录的路径:

nltk.download('popular', download_dir='/path/to/nltk_data/')

运行新的python实例时,

nltk.data.path.append('/path/to/nltk_data/')

另见: How to config nltk data directory from code?

答案 1 :(得分:0)

您必须在nltk.data.path.append('/path/to/nltk_data')文件中设置settings.py并且程序相同

libs.py包含所有包装细节

之后将其添加到您的泊坞窗文件

RUN pip install nltk

RUN python nltk_pkg.py

COPY start.sh /nltk_pkg.py

COPY start.sh /start.sh

它对我有用。