Chunking Stanford Parser

时间:2017-07-27 19:32:17

标签: python nltk stanford-nlp

我想知道如何使用SELECT B.USER_ID ,B.USER_NAME ,B.USER_EMAIL FROM Table_A as A INNER JOIN Table_B as B ON A.USER_ID = B.USER_ID GROUP BY B.USER_ID HAVING COUNT(B.USER_ID) > 1 来块化从NLTK生成的树结构

Regex

一句话:"新泽西进口剩余燃料油,硫含量超过1%" 。从Stanford Parser生成的树看起来像这样:

from nltk.parse.corenlp import CoreNLPParser

我想提取以下短语,在元组中标记为:

(ROOT
  (NP
    (NP (NNP New) (NNP Jersey))
    (NP
      (NP (NNS Imports))
      (PP
        (IN of)
        (NP
          (NP (NNP Residual) (NN Fuel) (NN Oil))
          (, ,)
          (NP
            (NP (NNP Greater))
            (PP (IN Than) (NP (ADJP (CD 1) (NN %)) (NN Sulfur)))))))))

当我尝试获取所有子树(不包括树本身)时,我会得到类似的结果:

[('New', 'Jersey'), ('Imports'), ('Residual', 'Fuel', 'Oil'), ('Greater', 'Than', '1', '%', 'Sulfur')]

这不是那么好,因为有一些非常长的块可以进一步细分。

有人能指出我正确的方向吗?谢谢!

[edit],部分代码:

(u'New', u'Jersey')
(u'Imports', u'of', u'Residual', u'Fuel', u'Oil', u',', u'Greater', u'Than', u'1', u'%', u'Sulfur')
(u'Imports',)
(u'of', u'Residual', u'Fuel', u'Oil', u',', u'Greater', u'Than', u'1', u'%', u'Sulfur')
(u'Residual', u'Fuel', u'Oil', u',', u'Greater', u'Than', u'1', u'%', u'Sulfur')
(u'Residual', u'Fuel', u'Oil')
(u'Greater', u'Than', u'1', u'%', u'Sulfur')
(u'Greater',)
(u'Than', u'1', u'%', u'Sulfur')
(u'1', u'%', u'Sulfur')
(u'1', u'%')

0 个答案:

没有答案