我已实现以下数据结构:
class Node(object):
"""Rules:
A node's child is ONLY an iterable of nodes
A leaf node must NOT have children and MUST have word
"""
def __init__(self, tag, children=[], word=u""):
assert isinstance(tag, unicode) and isinstance(word, unicode)
self.tag=tag
self.word=word
self.parent=None #Set by recursive function
self.children=children #Can only be iterable of nodes now
for child in self.children:
child.parent=self
def matches(self, node):
"""Match RECURSIVELY down!"""
if self.tag == node.tag:
if all( map( lambda t:t[0].matches(t[1]), zip( self.children, node.children))):
if self.word != WILDCARD and node.word != WILDCARD:
return self.word == node.word
else:
return True
return False
def __unicode__(self):
childrenU= u", ".join( map( unicode, self.children))
return u"(%s, %s, %s)" % (self.tag, childrenU, self.word)
def __str__(self):
return unicode(self).encode('utf-8')
def __repr__(self):
return unicode(self)
所以一棵树基本上是一堆连接在一起的节点。
我正在解析S-Expression,如下所示: (VP (VP(VC w1) (NP (CP (IP (NP(NN w2)) (VP (ADVP(AD w3)) (VP(VA w4)))) (DEC w5)) (NP(NN w6)))) (ADVP(AD w7)))
所以我有兴趣编写一个子树与一个更大的树匹配。问题是,子树有通配符,我希望能够匹配这些字符。
例如: 如果给出一个子树,
(VP
(ADVP (AD X))
(VP (VA Y))))
“匹配”它们的操作应返回{X:W3,Y:W4}
任何人都可以推荐一个有效的,简单的解决方案吗?