我想从带有pos-tagging的文本中提取一些功能。我的目标是在列表中检索Noun-Verb组合。对于POS标签,我使用了Spacy 现在我的代码看起来像这样:
from spacy.de import German
nlp = German()
Verb = ["VERB"]
NN = ["NOUN"]
sentence = [["Du musst folgendes tun: Scheibe schließen, Tuer oeffnen, Fenster", ["Das ist deine Loesung: Sitz zurückstellen"])
texts = somePreprocessing(sentence) #Tokenization, Stopword removal
list2 = []
verb_toks = []
noun_toks = []
verblist = []
nounlist = []
pairlist = []
for text in texts:
for s in text:
st = nlp(unicode(s))
list.append(st)
for word in st:
if word.pos_ in Verb:
verblist.append(word)
if word.pos_ in NN:
nounlist.append(word)
if len(verblist) != 0 and len(nounlist) != 0:
pairlist.append((verblist, nounlist))
verblist = []
nounlist = []
list2.append(list)
list = []
print verblist
print nounlist
print pairlist
输出应如下所示:[[“Scheibe”,“schließen”,“Tuer”,“oeffnen”,“Fenster”,“anheben”],[“Sitz”,“zurückstellen”]
总结一下:
给出一个句子列表,如[[“这是一个例句”],[“这是另一个例句”]。
我的目的是基于POS标记检索这样的[[“名词”,“动词”,“名词”,“动词”,“名词”,“动词”,[“名词”,“动词]]等列表。
listOfSentence = [[".."],[".."]]
pos = posTagger(listOfSentences)
list = matchingNounVerb(pos)
print list
=> [["Noun", "Verb, "...", "..., "...", "...], ["Noun", "Verb]])
感谢您的帮助;)