Question

我正在尝试在R中使用koRpus在运行RHEL6的Linux服务器上进行词形还原。上周，当我安装了MRO（Microsoft R Open）3.2.3时，下面的代码效果很好：

library(koRpus)
lw = c("dancing","flying","flew")
res = treetag(lw,treetagger="manual",format="obj",TT.tknz = F, lang="en",
        TT.options=list(path="/usr/local/bin/TreeTagger",preset="en"))

现在我正在运行MRO 3.3.0，但我收到以下错误：

Error in grepl("(^\\p{P}*\\p{L}\\p{M}*\\.)", tkn, perl = TRUE) :
  invalid regular expression '(^\p{P}*\p{L}\p{M}*\.)'
In addition: Warning message:
In grepl("(^\\p{P}*\\p{L}\\p{M}*\\.)", tkn, perl = TRUE) :
  PCRE pattern compilation error
        'support for \P, \p, and \X has not been compiled'
        at 'p{P}*\p{L}\p{M}*\.)'

好的，这意味着我的PCRE需要通过unicode支持进行重新编译。事实上，当我运行下面的代码时，我发现这是确切的问题。我也看到我正在运行8.37版本。

pcre_config()
#>         UTF-8 Unicode properties                JIT
#>          TRUE              FALSE              FALSE

extSoftVersion()
#>                 zlib                     bzlib                        xz
#>              "1.2.8"      "1.0.6, 6-Sept-2010"                   "5.2.2"
#>                 PCRE                       ICU                       TRE
#>    "8.37 2015-04-28"                    "57.1" "TRE 0.8.0 R_fixes (BSD)"
#>                iconv
#>         "glibc 2.12"

现在，我继续安装了8.39并且（希望）设置了正确的标志。

./configure --enable-utf8 --enable-unicode-properties
make
make install

所以现在当我运行pcretest -C时，我得到了

PCRE version 8.39 2016-06-14
Compiled with
  8-bit support
  UTF-8 support
  Unicode properties support
  No just-in-time compiler support
  Newline sequence is LF
  \R matches all Unicode newlines
  Internal link size = 2
  POSIX malloc threshold = 10
  Parentheses nest limit = 250
  Default match limit = 10000000
  Default recursion depth limit = 10000000
  Match recursion uses stack

但是当我再次启动R时，我的pcre_config()会产生相同的结果，treetag调用失败的情况相同，而extSoftVersion()仍会报告8.37。

我需要为R开始使用新的PCRE版本做些什么？

更深更深...... 从3.3.0开始，R显然不再附带PCRE（每https://mran.microsoft.com/news/#r330）因此它（据称）依赖于系统PCRE的安装。我通过服务器删除了每个可识别的PCRE文件（完全卸载），R仍然报告PCRE 8.37 2015-04-28，这表明RHEL 6的MRO 3.3.0包括PCRE，尽管另有说法。此外，grepl命令仍会因错误而失败，因此这不仅仅是extSoftVersion的问题。

R - 正则表达式错误（PCRE版本）

0 个答案: