SBCL运行程序(Stanford Parser)或重定向Unix中的I / O

时间:2016-08-25 23:15:06

标签: common-lisp stanford-nlp named-pipes sbcl

我在SBCL lisp中将斯坦福分析器作为一个子进程产生麻烦:

(defvar *p* (sb-ext:run-program "/usr/bin/java"
   (list     "-cp"
    "\"/home/todd/CoreNLP/*\""
    "-Xmx2g"
    "edu.stanford.nlp.pipeline.StanfordCoreNLP"
    "-annotators"
    "tokenize,ssplit,pos,lemma,ner,parse,dcoref"
    "-outputFormat"
    "text")
    :wait nil :input :stream :output :stream :error :output))

看起来它启动了程序,然后解析器就死掉了。我无法真实地表达所发生的一切,因为这个文本窗口会将我的文本格式化为其他内容。无论如何,我尝试运行的其他程序不会发生这种情况:

(defvar *g* (sb-ext:run-program "/usr/bin/gnuplot" nil
                                :wait nil
                                :input :stream
                                :output :stream
                                :error :output))

在这种情况下,程序(gnuplot)继续运行。

我想知道这是不是因为斯坦福分析师花了这么长时间才开始让lisp放弃它。

如果有人对此有任何见解,我会很激动。这将是从Lisp内部与Stanford Parser交流的理想方式。否则,我可能有一个完全有效的解决方法,即启动解析器,其输入来自,并输出到文件系统中的命名管道。这必须在上面的命令行选项中发生,因为程序必须处于交互模式(如果解析器不处于交互模式,则解析器会创建不同类型的输出)

然而,这有点偏离主题到Unix问题,所以这只是有人是专家:

假设我在CoreNLP目录中有一个inpipe和outpipe,那么启动解析器的命令行是什么,所以它的输入和输出将分别连接到程序的stdin和stdout?我是否可以采取任何步骤(在这一点上)以确保我在以后从Lisp程序中访问管道时不会遇到缓冲问题?

有没有人对如何在lisp内与斯坦福分析师交谈有任何想法?

任何见解都会一如既往地受到赞赏。

-Todd

1 个答案:

答案 0 :(得分:4)

我建议您使用inferior-shell在常见的lisp中执行命令。

我从未使用过standford-parser。所以我将它安装在我的Mac whit homebrew上,然后我可以将它用作命令行:

 2016-08-26 09:04:06 ☆ |ruby-2.2.3@laguna| Antonios-MBP in ~/learn/lisp/cl-l/stackoverflow/scripts
± |master ?:2 ✗| → lexparser.sh text.txt
[main] INFO edu.stanford.nlp.parser.lexparser.LexicalizedParser - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...
 done [0.6 sec].
Parsing file: text.txt
Parsing [sent. 1 len. 42]: The strongest rain ever recorded in India shut down the financial hub of Mumbai , snapped communication lines , closed airports and forced thousands of people to sleep in their offices or walk home during the night , officials said today .
(ROOT
  (S
    (S
      (NP
        (NP (DT The) (JJS strongest) (NN rain))
        (VP
          (ADVP (RB ever))
          (VBN recorded)
          (PP (IN in)
            (NP (NNP India)))))
      (VP
        (VP (VBD shut)
          (PRT (RP down))
          (NP
            (NP (DT the) (JJ financial) (NN hub))
            (PP (IN of)
              (NP (NNP Mumbai)))))
        (, ,)
        (VP (VBD snapped)
          (NP (NN communication) (NNS lines)))
        (, ,)
        (VP (VBD closed)
          (NP (NNS airports)))
        (CC and)
        (VP (VBD forced)
          (NP
            (NP (NNS thousands))
            (PP (IN of)
              (NP (NNS people))))
          (S
            (VP (TO to)
              (VP
                (VP (VB sleep)
                  (PP (IN in)
                    (NP (PRP$ their) (NNS offices))))
                (CC or)
                (VP (VB walk)
                  (NP (NN home))
                  (PP (IN during)
                    (NP (DT the) (NN night))))))))))
    (, ,)
    (NP (NNS officials))
    (VP (VBD said)
      (NP (NN today)))
    (. .)))

det(rain-3, The-1)
amod(rain-3, strongest-2)
nsubj(shut-8, rain-3)
nsubj(snapped-16, rain-3)
nsubj(closed-20, rain-3)
nsubj(forced-23, rain-3)
advmod(recorded-5, ever-4)
acl(rain-3, recorded-5)
case(India-7, in-6)
nmod:in(recorded-5, India-7)
ccomp(said-40, shut-8)
compound:prt(shut-8, down-9)
det(hub-12, the-10)
amod(hub-12, financial-11)
dobj(shut-8, hub-12)
case(Mumbai-14, of-13)
nmod:of(hub-12, Mumbai-14)
conj:and(shut-8, snapped-16)
ccomp(said-40, snapped-16)
compound(lines-18, communication-17)
dobj(snapped-16, lines-18)
conj:and(shut-8, closed-20)
ccomp(said-40, closed-20)
dobj(closed-20, airports-21)
cc(shut-8, and-22)
conj:and(shut-8, forced-23)
ccomp(said-40, forced-23)
dobj(forced-23, thousands-24)
nsubj(sleep-28, thousands-24)
nsubj(walk-33, thousands-24)
case(people-26, of-25)
nmod:of(thousands-24, people-26)
mark(sleep-28, to-27)
xcomp(forced-23, sleep-28)
case(offices-31, in-29)
nmod:poss(offices-31, their-30)
nmod:in(sleep-28, offices-31)
cc(sleep-28, or-32)
xcomp(forced-23, walk-33)
conj:or(sleep-28, walk-33)
dobj(walk-33, home-34)
case(night-37, during-35)
det(night-37, the-36)
nmod:during(walk-33, night-37)
nsubj(said-40, officials-39)
root(ROOT-0, said-40)
nmod:tmod(said-40, today-41)

Parsed file: text.txt [1 sentences].
Parsed 42 words in 1 sentences (18.00 wds/sec; 0.43 sents/sec).

实际上这执行一个shell脚本,这是一个基本上是java命令:

 2016-08-26 09:04:24 ☆ |ruby-2.2.3@laguna| Antonios-MBP in ~/learn/lisp/cl-l/stackoverflow/scripts
± |master ?:2 ✗| → cat /usr/local/Cellar/stanford-parser/3.6.0/libexec/lexparser.sh
#!/usr/bin/env bash
#
# Runs the English PCFG parser on one or more files, printing trees only

if [ ! $# -ge 1 ]; then
  echo Usage: `basename $0` 'file(s)'
  echo
  exit
fi

scriptdir=`dirname $0`

java -mx150m -cp "$scriptdir/*:" edu.stanford.nlp.parser.lexparser.LexicalizedParser \
 -outputFormat "penn,typedDependencies" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz $*

然后我有了所有的东西,所以用普通的lisp执行它:

首先使用quicklisp安装它:

CL-USER> (ql:quickload 'inferior-shell)
To load "inferior-shell":
  Load 1 ASDF system:
    inferior-shell
; Loading "inferior-shell"

(INFERIOR-SHELL)

然后尝试它是否有效:

CL-USER> (inferior-shell:run/ss '(lexparser.sh))
"Usage: lexparser.sh file(s)
"
NIL
0

完善它执行lexparser并返回标准输出的字符串,标准错误为nil,执行程序为0。

最后准备一份文字,我从他们的网站上选择样本:

的text.txt:

印度官员今天表示,印度有史以来最严重的降雨关闭了孟买的金融中心,通信线路被关闭,机场关闭,迫使成千上万的人在办公室里睡觉或晚上回家。

然后当我执行它时。

CL-USER> (inferior-shell:run/ss '(lexparser.sh text.txt))
"(ROOT
  (S
    (S
      (NP
        (NP (DT The) (JJS strongest) (NN rain))
        (VP
          (ADVP (RB ever))
          (VBN recorded)
          (PP (IN in)
            (NP (NNP India)))))
      (VP
        (VP (VBD shut)
          (PRT (RP down))
          (NP
            (NP (DT the) (JJ financial) (NN hub))
            (PP (IN of)
              (NP (NNP Mumbai)))))
        (, ,)
        (VP (VBD snapped)
          (NP (NN communication) (NNS lines)))
        (, ,)
        (VP (VBD closed)
          (NP (NNS airports)))
        (CC and)
        (VP (VBD forced)
          (NP
            (NP (NNS thousands))
            (PP (IN of)
              (NP (NNS people))))
          (S
            (VP (TO to)
              (VP
                (VP (VB sleep)
                  (PP (IN in)
                    (NP (PRP$ their) (NNS offices))))
                (CC or)
                (VP (VB walk)
                  (NP (NN home))
                  (PP (IN during)
                    (NP (DT the) (NN night))))))))))
    (, ,)
    (NP (NNS officials))
    (VP (VBD said)
      (NP (NN today)))
    (. .)))

det(rain-3, The-1)
amod(rain-3, strongest-2)
nsubj(shut-8, rain-3)
nsubj(snapped-16, rain-3)
nsubj(closed-20, rain-3)
nsubj(forced-23, rain-3)
advmod(recorded-5, ever-4)
acl(rain-3, recorded-5)
case(India-7, in-6)
nmod:in(recorded-5, India-7)
ccomp(said-40, shut-8)
compound:prt(shut-8, down-9)
det(hub-12, the-10)
amod(hub-12, financial-11)
dobj(shut-8, hub-12)
case(Mumbai-14, of-13)
nmod:of(hub-12, Mumbai-14)
conj:and(shut-8, snapped-16)
ccomp(said-40, snapped-16)
compound(lines-18, communication-17)
dobj(snapped-16, lines-18)
conj:and(shut-8, closed-20)
ccomp(said-40, closed-20)
dobj(closed-20, airports-21)
cc(shut-8, and-22)
conj:and(shut-8, forced-23)
ccomp(said-40, forced-23)
dobj(forced-23, thousands-24)
nsubj(sleep-28, thousands-24)
nsubj(walk-33, thousands-24)
case(people-26, of-25)
nmod:of(thousands-24, people-26)
mark(sleep-28, to-27)
xcomp(forced-23, sleep-28)
case(offices-31, in-29)
nmod:poss(offices-31, their-30)
nmod:in(sleep-28, offices-31)
cc(sleep-28, or-32)
xcomp(forced-23, walk-33)
conj:or(sleep-28, walk-33)
dobj(walk-33, home-34)
case(night-37, during-35)
det(night-37, the-36)
nmod:during(walk-33, night-37)
nsubj(said-40, officials-39)
root(ROOT-0, said-40)
nmod:tmod(said-40, today-41)
"
NIL
0

或者我可以把结果放在一个列表中:

CL-USER> (multiple-value-list (inferior-shell:run/ss '(lexparser.sh text.txt)))
("(ROOT
  (S
    (S
      (NP
        (NP (DT The) (JJS strongest) (NN rain))
        (VP
          (ADVP (RB ever))
          (VBN recorded)
          (PP (IN in)
            (NP (NNP India)))))
      (VP
        (VP (VBD shut)
          (PRT (RP down))
          (NP
            (NP (DT the) (JJ financial) (NN hub))
            (PP (IN of)
              (NP (NNP Mumbai)))))
        (, ,)
        (VP (VBD snapped)
          (NP (NN communication) (NNS lines)))
        (, ,)
        (VP (VBD closed)
          (NP (NNS airports)))
        (CC and)
        (VP (VBD forced)
          (NP
            (NP (NNS thousands))
            (PP (IN of)
              (NP (NNS people))))
          (S
            (VP (TO to)
              (VP
                (VP (VB sleep)
                  (PP (IN in)
                    (NP (PRP$ their) (NNS offices))))
                (CC or)
                (VP (VB walk)
                  (NP (NN home))
                  (PP (IN during)
                    (NP (DT the) (NN night))))))))))
    (, ,)
    (NP (NNS officials))
    (VP (VBD said)
      (NP (NN today)))
    (. .)))

det(rain-3, The-1)
amod(rain-3, strongest-2)
nsubj(shut-8, rain-3)
nsubj(snapped-16, rain-3)
nsubj(closed-20, rain-3)
nsubj(forced-23, rain-3)
advmod(recorded-5, ever-4)
acl(rain-3, recorded-5)
case(India-7, in-6)
nmod:in(recorded-5, India-7)
ccomp(said-40, shut-8)
compound:prt(shut-8, down-9)
det(hub-12, the-10)
amod(hub-12, financial-11)
dobj(shut-8, hub-12)
case(Mumbai-14, of-13)
nmod:of(hub-12, Mumbai-14)
conj:and(shut-8, snapped-16)
ccomp(said-40, snapped-16)
compound(lines-18, communication-17)
dobj(snapped-16, lines-18)
conj:and(shut-8, closed-20)
ccomp(said-40, closed-20)
dobj(closed-20, airports-21)
cc(shut-8, and-22)
conj:and(shut-8, forced-23)
ccomp(said-40, forced-23)
dobj(forced-23, thousands-24)
nsubj(sleep-28, thousands-24)
nsubj(walk-33, thousands-24)
case(people-26, of-25)
nmod:of(thousands-24, people-26)
mark(sleep-28, to-27)
xcomp(forced-23, sleep-28)
case(offices-31, in-29)
nmod:poss(offices-31, their-30)
nmod:in(sleep-28, offices-31)
cc(sleep-28, or-32)
xcomp(forced-23, walk-33)
conj:or(sleep-28, walk-33)
dobj(walk-33, home-34)
case(night-37, during-35)
det(night-37, the-36)
nmod:during(walk-33, night-37)
nsubj(said-40, officials-39)
root(ROOT-0, said-40)
nmod:tmod(said-40, today-41)
" NIL 0)

请记住,这个程序使用java 8,而我正在使用standford-parser 3.6.0