TreeTagger安装成功但无法打开.par文件

时间:2013-03-19 15:17:28

标签: installation nlp stemming pos-tagger lemmatization

有人知道如何在TreeTagger中解决此文件读取错误,这是一种常用的自然语言处理工具POS用于alvas@ikoma:~/treetagger$ echo 'Hello world!' | cmd/tree-tagger-english reading parameters ... ERROR: Can't open for reading: /home/alvas/treetagger/lib/english.par aborted. 标记,lematize和chunk句子?

alvas@ikoma:~$ mkdir treetagger
alvas@ikoma:~$ cd treetagger
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/tree-tagger-linux-3.2.tar.gz
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/tagger-scripts.tar.gz
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/install-tagger.sh
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/dutch-par-linux-3.2-utf8.bin.gz
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/german-par-linux-3.2-utf8.bin.gz
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/italian-par-linux-3.2-utf8.bin.gz
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/spanish-par-linux-3.2-utf8.bin.gz
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/french-par-linux-3.2-utf8.bin.gz

alvas@ikoma:~/treetagger$ sh install-tagger.sh 

Linux version of TreeTagger installed.
Tagging scripts installed.
German parameter file (Linux, UTF8) installed.
German chunker parameter file (Linux) installed.
French parameter file (Linux, UTF8) installed.
French chunker parameter file (Linux, UTF8) installed.
Italian parameter file (Linux, UTF8) installed.
Spanish parameter file (Linux, UTF8) installed.
Dutch parameter file (Linux, UTF8) installed.
Path variables modified in tagging scripts.

You might want to add /home/alvas/treetagger/cmd and /home/alvas/treetagger/bin to the PATH variable so that you do not need to specify the full path to run the tagging scripts.

我没有遇到http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/installation-hints.txt暗示的任何可能的安装问题。 我已按照网页上的说明进行操作,并已正确安装(http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/#Linux):

alvas@ikoma:~/treetagger$ echo 'Hello world!' | cmd/tree-tagger-english 
    reading parameters ...

ERROR: Can't open for reading: /home/alvas/treetagger/lib/english.par
aborted.
alvas@ikoma:~/treetagger$ echo 'Das ist ein Test.' | cmd/tagger-chunker-german

ERROR: Can't open for reading: /home/alvas/treetagger/lib/german-chunker.par
aborted.

ERROR: Can't open for reading: /home/alvas/treetagger/lib/german.par
aborted.
    reading parameters ...

ERROR: Can't open for reading: /home/alvas/treetagger/lib/german.par
aborted.

但是当我尝试测试软件时,我得到了这些错误:

{{1}}

3 个答案:

答案 0 :(得分:5)

我认为有两个问题:首先,脚本名称中应包含“-utf8”,例如: cmd/tagger-chunker-german-utf8,因为您下载了UTF-8数据。其次,标记和分块每个都需要一个数据文件。请参阅主页,其中包含“PC的参数文件”和“用于PC的Chunker参数文件”部分 - 从这两个部分下载文件,然后重新执行install-tagger.sh

答案 1 :(得分:0)

你写了 cmd / tree-tagger-english,但我认为正确的路径(有参数文件的地方)是:

  

<强> LIB /树标注器 - 英语

答案 2 :(得分:0)

我有同样的问题。我意识到没有提取我为所需语言下载的.par文件(它们仍位于.gz中)。

确保首先将它们提取到目录中,然后重试。