检测POS标签模式以及指定的单词

时间:2016-01-08 08:59:03

标签: python nlp nltk pos-tagger

我需要在某些指定单词之前/之后识别某些POS标签,例如以下带标记的句子:

[('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]

可以抽象为形式"将是" +形容词

同样:

[('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]

的形式是"能够" +动词

如何在句子中检查这些类型的模式。我正在使用NLTK。

1 个答案:

答案 0 :(得分:2)

假设你想逐字逐句地检查“will”后跟“be”,然后是一些形容词,你可以这样做:

def would_be(tagged):
    return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))

输入是一个POS标记的句子(元组列表,根据NLTK)。

它检查列表中是否有任何三个元素,使得“will”在“be”旁边,“be”在标记为形容词('JJ')的单词旁边。只要这个“模式”匹配,它就会返回True

你可以为第二种句子做一些非常类似的事情:

def am_able_to(tagged):
    return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))

这是该计划的驱动程序:

s1 = [('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]
s2 = [('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]

def would_be(tagged):
   return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))

def am_able_to(tagged):
    return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))

sent1 = ' '.join(s[0] for s in s1)
sent2 = ' '.join(s[0] for s in s2)

print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s1), sent1))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s1), sent1))

print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s2), sent2))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s2), sent2))

这正确输出:

Is 'This feature would be nice to have' of type 'would be' + adj? True
Is 'This feature would be nice to have' of type 'am able to' + verb? False
Is 'I am able to delete the group functionality' of type 'would be' + adj? False
Is 'I am able to delete the group functionality' of type 'am able to' + verb? True

如果您想概括一下,您可以更改是否检查文字或其POS标签。