我需要在某些指定单词之前/之后识别某些POS标签,例如以下带标记的句子:
[('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]
可以抽象为形式"将是" +形容词
同样:
[('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]
的形式是"能够" +动词
如何在句子中检查这些类型的模式。我正在使用NLTK。
答案 0 :(得分:2)
假设你想逐字逐句地检查“will”后跟“be”,然后是一些形容词,你可以这样做:
def would_be(tagged):
return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))
输入是一个POS标记的句子(元组列表,根据NLTK)。
它检查列表中是否有任何三个元素,使得“will”在“be”旁边,“be”在标记为形容词('JJ')的单词旁边。只要这个“模式”匹配,它就会返回True
。
你可以为第二种句子做一些非常类似的事情:
def am_able_to(tagged):
return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))
这是该计划的驱动程序:
s1 = [('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]
s2 = [('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]
def would_be(tagged):
return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))
def am_able_to(tagged):
return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))
sent1 = ' '.join(s[0] for s in s1)
sent2 = ' '.join(s[0] for s in s2)
print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s1), sent1))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s1), sent1))
print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s2), sent2))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s2), sent2))
这正确输出:
Is 'This feature would be nice to have' of type 'would be' + adj? True
Is 'This feature would be nice to have' of type 'am able to' + verb? False
Is 'I am able to delete the group functionality' of type 'would be' + adj? False
Is 'I am able to delete the group functionality' of type 'am able to' + verb? True
如果您想概括一下,您可以更改是否检查文字或其POS标签。