我是nltk的新手并尝试从以下代码中提取PERSON,ORGANIZATION,GPE:
for i in tokcomp:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
namedEnt = nltk.ne_chunk(tagged, binary=False)
print(namedEnt)
我得到的输出是:
(S
Our/PRP$
direct/JJ
competitors/NNS
include/VBP
,/,
among/IN
others/NNS
,/,
(PERSON Accenture/NNP)
,/,
(GPE Capgemini/NNP)
,/,
(ORGANIZATION Computer/NNP Sciences/NNPS Corporation/NNP)
,/,
(GPE Genpact/NNP)
,/,
(ORGANIZATION HCL/NNP Technologies/NNPS)
,/,
(ORGANIZATION HP/NNP Enterprise/NNP)
,/,
(ORGANIZATION IBM/NNP Global/NNP Services/NNPS)
,/,
(ORGANIZATION Infosys/NNP Technologies/NNPS)
,/,
(PERSON Tata/NNP Consultancy/NNP Services/NNPS)
and/CC
(PERSON Wipro/NNP)
./.)
(S
These/DT
markets/NNS
also/RB
include/VBP
numerous/JJ
smaller/JJR
local/JJ
competitors/NNS
in/IN
the/DT
various/JJ
geographic/JJ
markets/NNS
in/IN
which/WDT
we/PRP
operate/VBP
which/WDT
may/MD
be/VB
able/JJ
to/TO
provide/VB
services/NNS
and/CC
solutions/NNS
at/IN
lower/JJR
costs/NNS
or/CC
on/IN
terms/NNS
more/RBR
attractive/JJ
to/TO
clients/NNS
than/IN
we/PRP
can/MD
./.)
(S
Our/PRP$
direct/JJ
competitors/NNS
include/VBP
,/,
among/IN
others/NNS
,/,
(PERSON Accenture/NNP)
,/,
(GPE Capgemini/NNP)
,/,
(ORGANIZATION Computer/NNP Sciences/NNPS Corporation/NNP)
,/,
(GPE Genpact/NNP)
,/,
(ORGANIZATION HCL/NNP Technologies/NNPS)
,/,
(ORGANIZATION HP/NNP Enterprise/NNP)
,/,
(ORGANIZATION IBM/NNP Global/NNP Services/NNPS)
,/,
(ORGANIZATION Infosys/NNP Technologies/NNPS)
,/,
(PERSON Tata/NNP Consultancy/NNP Services/NNPS)
and/CC
(PERSON Wipro/NNP)
./.)
(S
The/DT
rates/NNS
we/PRP
are/VBP
able/JJ
to/TO
recover/VB
for/IN
our/PRP$
services/NNS
are/VBP
affected/VBN
by/IN
a/DT
number/NN
of/IN
factors/NNS
,/,
including/VBG
:/:
•/VB
our/PRP$
clients’/JJ
perceptions/NNS
of/IN
our/PRP$
ability/NN
to/TO
add/VB
value/NN
through/IN
our/PRP$
services/NNS
;/:
•/NNP
introduction/NN
of/IN
new/JJ
services/NNS
or/CC
products/NNS
by/IN
us/PRP
or/CC
our/PRP$
competitors/NNS
;/:
•/VB
our/PRP$
competitors’/NN
pricing/NN
policies/NNS
;/:
•/VB
our/PRP$
ability/NN
to/TO
accurately/RB
estimate/VB
,/,
attain/NN
and/CC
sustain/NN
contract/NN
revenues/NNS
,/,
margins/NNS
and/CC
cash/NN
flows/NNS
over/IN
increasingly/RB
longer/JJR
contract/NN
periods/NNS
;/:
•/NNP
bid/NN
practices/NNS
of/IN
clients/NNS
and/CC
their/PRP$
use/NN
of/IN
third-party/JJ
advisors/NNS
;/:
•/VB
the/DT
use/NN
by/IN
our/PRP$
competitors/NNS
and/CC
our/PRP$
clients/NNS
of/IN
offshore/JJ
resources/NNS
to/TO
provide/VB
lower-cost/JJ
service/NN
delivery/NN
capabilities/NNS
;/:
•/VB
our/PRP$
ability/NN
to/TO
charge/VB
premium/NN
prices/NNS
when/WRB
justified/VBN
by/IN
market/NN
demand/NN
or/CC
the/DT
type/NN
of/IN
service/NN
;/:
and/CC
•/VB
general/JJ
economic/JJ
and/CC
political/JJ
conditions/NNS
./.)
(S
For/IN
our/PRP$
internal/JJ
management/NN
reporting/NN
and/CC
budgeting/NN
purposes/NNS
,/,
we/PRP
use/VBP
non-GAAP/JJ
financial/JJ
information/NN
that/WDT
does/VBZ
not/RB
include/VB
stock-based/JJ
compensation/NN
expense/NN
,/,
acquisition-related/JJ
charges/NNS
and/CC
net/JJ
non-operating/JJ
foreign/JJ
currency/NN
exchange/NN
gains/NNS
or/CC
losses/NNS
for/IN
financial/JJ
and/CC
operational/JJ
decision/NN
making/NN
,/,
to/TO
evaluate/VB
period-to-period/JJ
comparisons/NNS
and/CC
for/IN
making/VBG
comparisons/NNS
of/IN
our/PRP$
operating/NN
results/NNS
to/TO
those/DT
of/IN
our/PRP$
competitors/NNS
./.)
我经历了很多链接,但没有找到符合我目的的方法来提取被标记为人员,组织和GPE的公司。
如果有任何链接可以提供有关提取nltk网站以外的命名实体的更多信息,那将非常感谢。