Question

我正在运行nltk测试的服务器无法直接访问http://www.nltk.org/nltk_data/的外部nltk模型，但我们确实有一个私有镜像设置来访问nltk模型。

如何告诉ntlk下载程序从私有镜像安装而不是http://www.nltk.org/nltk_data/？

我希望这可行，但不会：

>>> nltk.downloader.Downloader(server_index_url='https://MyNltkMirror/index.xml').download()
NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> d

Download which package (l=list; x=cancel)?
  Identifier> abc
    Downloading package abc to /path/to/nltk_data...
    Error downloading 'abc' from
        <https://raw.githubusercontent.com/nltk/nltk_data/gh-
        pages/packages/corpora/abc.zip>:   <urlopen error [Errno 104]
        Connection reset by peer>

或者我是否可能正确执行此操作并且存在从我的服务器连接到raw.githubusercontent.com的访问问题？

感谢。

Answer 1

尝试在不使用交互模式的情况下下载软件包。

# Your mirror.
mirror_url = "http://example.com/my_corpus_data/index.xml"
dler = nltk.downloader.Downloader(mirror_url)

# Directly download the package(s) without using the interactive mode.
dler.download('popular')

从外部网址

1 个答案: