我将在python中使用stanford NER对齐包含相同NER类别('PERSON','LOCATION','ORGANIZATION')的两个不同句子。
sentence1 = John is reading a booklet in London Library
sentence2 = Michael reads a brochure in England Library
我想要的结果如下:
result= [[[u'John', u'Person'],[u'Michael',u'PERSON']] , [[u'London',u'LOCATION],['u'England',u'LOCATION']]]
我尝试过我的代码
def alignNER(self, sent1, sent2):
java.path = 'C:/program files'/java/jre/bin/java.exe'
os.environ ['JAVAHOME'] = java.path
st = StanfordNERTagger('usr/english.all.3class.distsim.crf.ser','usr/stanford-ner.jar')
result = []
for i in xrange(len(sent1)):
if sent1[i]:
word = sent1[i]
temp = st.tag(word.split())
for token, tag in temp:
if tag in ['PERSON','LOCATION','ORGANIZATION']:
result.append([temp,i])
return result
for j in xrange(len(sent2)):
if sent2[j]:
word = sent2[j]
temp = st.tag(word.split())
for token, tag in temp:
if tag in ['PERSON','LOCATION','ORGANIZATION']:
result.append([temp,j])
return result
当我称之为
checkNER = alignNER(sent1, sent2)
print checkNER
结果只显示带有检测到的类别的位置索引的句子1:
[[[u'John', u'Person'],0],[[u'London',u'LOCATION],6]]
任何人都可以提供帮助吗?感谢