当binary = False时,如何从树结构中提取PER,ORG,GPE等命名实体?

时间:2017-02-02 21:51:36

标签: python machine-learning nlp nltk stanford-nlp

我是nltk的新手并尝试从以下代码中提取PERSON,ORGANIZATION,GPE:

for i in tokcomp:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
namedEnt = nltk.ne_chunk(tagged, binary=False)
print(namedEnt)

我得到的输出是:

(S
  Our/PRP$
  direct/JJ
  competitors/NNS
  include/VBP
  ,/,
  among/IN
  others/NNS
  ,/,
  (PERSON Accenture/NNP)
  ,/,
  (GPE Capgemini/NNP)
  ,/,
  (ORGANIZATION Computer/NNP Sciences/NNPS Corporation/NNP)
  ,/,
  (GPE Genpact/NNP)
  ,/,
  (ORGANIZATION HCL/NNP Technologies/NNPS)
  ,/,
  (ORGANIZATION HP/NNP Enterprise/NNP)
  ,/,
  (ORGANIZATION IBM/NNP Global/NNP Services/NNPS)
  ,/,
  (ORGANIZATION Infosys/NNP Technologies/NNPS)
  ,/,
  (PERSON Tata/NNP Consultancy/NNP Services/NNPS)
  and/CC
  (PERSON Wipro/NNP)
  ./.)
(S
  These/DT
  markets/NNS
  also/RB
  include/VBP
  numerous/JJ
  smaller/JJR
  local/JJ
  competitors/NNS
  in/IN
  the/DT
  various/JJ
  geographic/JJ
  markets/NNS
  in/IN
  which/WDT
  we/PRP
  operate/VBP
  which/WDT
  may/MD
  be/VB
  able/JJ
  to/TO
  provide/VB
  services/NNS
  and/CC
  solutions/NNS
  at/IN
  lower/JJR
  costs/NNS
  or/CC
  on/IN
  terms/NNS
  more/RBR
  attractive/JJ
  to/TO
  clients/NNS
  than/IN
  we/PRP
  can/MD
  ./.)
(S
  Our/PRP$
  direct/JJ
  competitors/NNS
  include/VBP
  ,/,
  among/IN
  others/NNS
  ,/,
  (PERSON Accenture/NNP)
  ,/,
  (GPE Capgemini/NNP)
  ,/,
  (ORGANIZATION Computer/NNP Sciences/NNPS Corporation/NNP)
  ,/,
  (GPE Genpact/NNP)
  ,/,
  (ORGANIZATION HCL/NNP Technologies/NNPS)
  ,/,
  (ORGANIZATION HP/NNP Enterprise/NNP)
  ,/,
  (ORGANIZATION IBM/NNP Global/NNP Services/NNPS)
  ,/,
  (ORGANIZATION Infosys/NNP Technologies/NNPS)
  ,/,
  (PERSON Tata/NNP Consultancy/NNP Services/NNPS)
  and/CC
  (PERSON Wipro/NNP)
  ./.)
(S
  The/DT
  rates/NNS
  we/PRP
  are/VBP
  able/JJ
  to/TO
  recover/VB
  for/IN
  our/PRP$
  services/NNS
  are/VBP
  affected/VBN
  by/IN
  a/DT
  number/NN
  of/IN
  factors/NNS
  ,/,
  including/VBG
  :/:
  •/VB
  our/PRP$
  clients’/JJ
  perceptions/NNS
  of/IN
  our/PRP$
  ability/NN
  to/TO
  add/VB
  value/NN
  through/IN
  our/PRP$
  services/NNS
  ;/:
  •/NNP
  introduction/NN
  of/IN
  new/JJ
  services/NNS
  or/CC
  products/NNS
  by/IN
  us/PRP
  or/CC
  our/PRP$
  competitors/NNS
  ;/:
  •/VB
  our/PRP$
  competitors’/NN
  pricing/NN
  policies/NNS
  ;/:
  •/VB
  our/PRP$
  ability/NN
  to/TO
  accurately/RB
  estimate/VB
  ,/,
  attain/NN
  and/CC
  sustain/NN
  contract/NN
  revenues/NNS
  ,/,
  margins/NNS
  and/CC
  cash/NN
  flows/NNS
  over/IN
  increasingly/RB
  longer/JJR
  contract/NN
  periods/NNS
  ;/:
  •/NNP
  bid/NN
  practices/NNS
  of/IN
  clients/NNS
  and/CC
  their/PRP$
  use/NN
  of/IN
  third-party/JJ
  advisors/NNS
  ;/:
  •/VB
  the/DT
  use/NN
  by/IN
  our/PRP$
  competitors/NNS
  and/CC
  our/PRP$
  clients/NNS
  of/IN
  offshore/JJ
  resources/NNS
  to/TO
  provide/VB
  lower-cost/JJ
  service/NN
  delivery/NN
  capabilities/NNS
  ;/:
  •/VB
  our/PRP$
  ability/NN
  to/TO
  charge/VB
  premium/NN
  prices/NNS
  when/WRB
  justified/VBN
  by/IN
  market/NN
  demand/NN
  or/CC
  the/DT
  type/NN
  of/IN
  service/NN
  ;/:
  and/CC
  •/VB
  general/JJ
  economic/JJ
  and/CC
  political/JJ
  conditions/NNS
  ./.)
(S
  For/IN
  our/PRP$
  internal/JJ
  management/NN
  reporting/NN
  and/CC
  budgeting/NN
  purposes/NNS
  ,/,
  we/PRP
  use/VBP
  non-GAAP/JJ
  financial/JJ
  information/NN
  that/WDT
  does/VBZ
  not/RB
  include/VB
  stock-based/JJ
  compensation/NN
  expense/NN
  ,/,
  acquisition-related/JJ
  charges/NNS
  and/CC
  net/JJ
  non-operating/JJ
  foreign/JJ
  currency/NN
  exchange/NN
  gains/NNS
  or/CC
  losses/NNS
  for/IN
  financial/JJ
  and/CC
  operational/JJ
  decision/NN
  making/NN
  ,/,
  to/TO
  evaluate/VB
  period-to-period/JJ
  comparisons/NNS
  and/CC
  for/IN
  making/VBG
  comparisons/NNS
  of/IN
  our/PRP$
  operating/NN
  results/NNS
  to/TO
  those/DT
  of/IN
  our/PRP$
  competitors/NNS
  ./.)

我经历了很多链接,但没有找到符合我目的的方法来提取被标记为人员,组织和GPE的公司。

如果有任何链接可以提供有关提取nltk网站以外的命名实体的更多信息,那将非常感谢。

1 个答案:

答案 0 :(得分:0)

应用此link中的代码,并能够从上述结果中获取命名实体。使用nltk.ne_chunk_sents()函数代替nltk.ne_chunk。