Question

我正在使用nltk trees来阅读文本的stanford语法分析（使用Tree.fromstring()），并且我正在寻找给定的 leaf 位置在较大的树中的子树。基本上，我喜欢leaf_treeposition()的反面。

在树t中，我得到了子树np，我想要的是索引x ，以便：

t.leaves()[x] == np.leaves()[0] # x = ???(t, np)

我不想使用t.leaves().index(...)，因为句子中可能会出现np的几个，我需要正确的而不是第一个。

我拥有的内容是np t中的树位置（ParentedTree），{{1}这样：

np.treeposition()

我想一个繁琐的解决方案是在所有级别为t[np.treeposition()] == np的所有left_siblings总结叶子。或者我可以浏览所有叶子，直到np等于leaf_treeposition(leaf)，但这听起来不是最理想的。

有更好的方法吗？

Answer 1

编辑：毕竟有一个简单的解决方案：

构造子树的第一片叶子的树位置。
在所有叶子树位置列表中查找。

设定：

>>> t = ParentedTree.fromstring('(S (NP (D the) (N dog)) (VP (V chased) (NP (D the) (N cat))))')
>>> np_pos = (1,1)
>>> np = t[np_pos]
>>> print(np)
(NP (D the) (N cat))

对于第1步，我将np的树位置与树连接起来第一片叶子在 np内的位置。所有叶子树位置的列表（第2步）让我感到难过，直到我仔细观察并意识到它在Tree API中实际实现（有点模糊）：order treepositions()参数的特殊值{1}}。您所追踪的x只是此列表中target_leafpos的索引。

>>> target_leafpos = np.treeposition() + np.leaf_treeposition(0) # Step 1
>>> all_leaf_treepositions = t.treepositions("leaves")           # Step 2
>>> x = all_leaf_treepositions.index(target_leafpos)
>>> print(x)
3

如果你不介意不可读的代码，你甚至可以把它写成一行：

x = t.treepositions("leaves").index( np.treeposition()+np.leaf_treeposition(0) )

在子树之前有多少叶子？

1 个答案: