我想将以下nltk Tree表示转换为JSON格式:
期望的输出:
{
"scores": {
"filler": [
[
"scores"
],
[
"for"
]
],
"extent": [
"highest"
],
"team": [
"India"
]
}
}
答案 0 :(得分:3)
看起来输入树可能包含具有相同名称的子项。为了支持一般情况,您可以将每个Tree
转换为将其名称映射到子列表的字典:
from nltk import Tree # $ pip install nltk
def tree2dict(tree):
return {tree.node: [tree2dict(t) if isinstance(t, Tree) else t
for t in tree]}
示例:
import json
import sys
tree = Tree('scores',
[Tree('extent', ['highest']),
Tree('filler',
[Tree('filler', ['scores']),
Tree('filler', ['for'])]),
Tree('team', ['India'])])
d = tree2dict(tree)
json.dump(d, sys.stdout, indent=2)
输出:
{
"scores": [
{
"extent": [
"highest"
]
},
{
"filler": [
{
"filler": [
"scores"
]
},
{
"filler": [
"for"
]
}
]
},
{
"team": [
"India"
]
}
]
}
答案 1 :(得分:2)
将树转换为dict,然后转换为JSON。
def tree_to_dict(tree):
tdict = {}
for t in tree:
if isinstance(t, nltk.Tree) and isinstance(t[0], nltk.Tree):
tdict[t.node] = tree_to_dict(t)
elif isinstance(t, nltk.Tree):
tdict[t.node] = t[0]
return tdict
def dict_to_json(dict):
return json.dumps(dict)
output_json = dict_to_json({tree.node: tree_to_dict(tree)})
答案 2 :(得分:2)
将树转换为以树标签为键的字典,然后您可以使用JSON转储将其转换为JSON
import nltk.tree.Tree
def tree_to_dict(tree):
tree_dict = dict()
leaves = []
for subtree in tree:
if type(subtree) == nltk.tree.Tree:
tree_dict.update(tree_to_dict(subtree))
else:
(expression,tag) = subtree
leaves.append(expression)
tree_dict[tree.label()] = " ".join(leaves)
return tree_dict
答案 3 :(得分:0)
相关替代方案。出于我的目的,我不需要保留精确的树,而是想要将实体作为键和标记提取为值列表。对于“汤姆和拉里为爱国者队效力”这句话。我想要以下JSON:
{
"PERSON": [
"Tom",
"Larry"
],
"ORGANIZATION": [
"Patriots"
]
}
这保留了令牌的顺序(每个实体类型),同时也没有为实体键设置的“stomping”值。您可以在其他答案中重复使用相同的json.dump
代码将此dict返回给json。
from nltk import tag,chunk,tokenize
def prep(sentence):
return chunk.ne_chunk(tag.pos_tag(tokenize.word_tokenize(sentence)))
t = prep("Tom and Larry play for the Patriots.")
def tree_to_dict(tree):
tree_dict = dict()
for st in tree:
# not everything gets a NE tag,
# so we can ignore untagged tokens
# which are stored in tuples
if isinstance(st, nltk.Tree):
if st.label() in tree_dict:
tree_dict[st.label()] = tree_dict[st.label()] + [st[0][0]]
else:
tree_dict[st.label()] = [st[0][0]]
return tree_dict
print(tree_to_dict(t))
# {'PERSON': ['Tom', 'Larry'], 'ORGANIZATION': ['Patriots']}