我正在尝试获取字符串集合,对字符串进行标记 分成单个字符,并将它们重组为JSON以达到目的 构建聚类树形图可视化(类似于this word tree,除了字符串而不是句子)。因此,有时候 跨数据共享(或转发)字符序列。
因此,例如,假设我有一个看起来像的文本文件:
xin_qn2
x_qing4n3
x_qing4nian_
这是我期待的所有内容;没有CSV标题或与数据相关的任何内容。然后,JSON对象看起来像:
{
"name": "x",
"children": [
{
"name": i,
},
{
"name": _,
"children": [
{
"name": "q"
}
]
}
]
}
等等。在将数据发送到D3.js之前,我一直试图提前构建数据,使用Ruby将行分成单个字符,但是我试图弄清楚如何在层次结构中构建数据JSON。
file_contents = File.open("single.txt", "r")
file_contents.readlines.each do |line|
parse = line.scan(/[A-Za-z][^A-Za-z]*/)
puts parse
end
我可以在d3.js的浏览器中执行此操作,但我还没有尝试过。
只是想知道是否有任何可能帮助我的建议,指针或现有工具/脚本。谢谢!
更新2014-10-02
所以我花了一点时间在Python中尝试这个,但我一直陷入困境。我也不是在处理儿童"元素正确,我现在看到了。有什么建议吗?
尝试一次
#!/usr/bin/python
from collections import defaultdict
import json
def tree():
return defaultdict(tree)
file_out = open('out.txt', 'wb')
nested = defaultdict(tree)
with open("single.txt") as f:
for line in f:
o = list(line)
char_lst = []
for chars in o:
d = {}
d['name']=chars
char_lst.append(d)
for word in d:
node = nested
for char in word:
node = node[char.lower()]
print node
print(json.dumps(nested))
尝试两次
#!/usr/bin/python
from collections import defaultdict
import json
def tree():
return defaultdict(tree)
nested = defaultdict(tree)
words = list(open("single.txt"))
words_output = open("out.json", "wb")
for word in words:
node = nested
for char in word:
node = node[char.lower()]
def print_nested(d, indent=0):
for k, v in d.iteritems():
print '{}{!r}:'.format(indent * ' ', k)
print_nested(v, indent + 1)
print_nested(nested)
答案 0 :(得分:1)
你几乎在那里尝试#2。将json.dumps(nested)
添加到末尾将打印以下JSON:
{
"x":{
"i":{
"n":{
"_":{
"q":{
"n":{
"2":{
}
}
}
}
}
},
"_":{
"q":{
"i":{
"n":{
"g":{
"4":{
"n":{
"i":{
"a":{
"n":{
"_":{
}
}
}
},
"3":{
}
}
}
}
}
}
}
}
}
}
关闭,但不是你想要的。顺便说一句,您还可以使用以下函数将嵌套的defaultdict转换为常规字典:
def convert(d):
return dict((key, convert(value)) for (key,value) in d.iteritems()) if isinstance(d, defaultdict) else d
但我们仍然只有一个dicts(dicts ......)的词典。使用递归,我们可以将其转换为您需要的格式,如下所示:
def format(d):
children = []
for (key, value) in d.iteritems():
children += [{"name":key, "children":format(value)}]
return children
最后,让我们打印出json:
print json.dumps(format(convert(nested)))
这将打印以下JSON(为清晰起见而格式化):
[
{
"name":"x",
"children":[
{
"name":"i",
"children":[
{
"name":"n",
"children":[
{
"name":"_",
"children":[
{
"name":"q",
"children":[
{
"name":"n",
"children":[
{
"name":"2",
"children":[
]
}
]
}
]
}
]
}
]
}
]
},
{
"name":"_",
"children":[
{
"name":"q",
"children":[
{
"name":"i",
"children":[
{
"name":"n",
"children":[
{
"name":"g",
"children":[
{
"name":"4",
"children":[
{
"name":"n",
"children":[
{
"name":"i",
"children":[
{
"name":"a",
"children":[
{
"name":"n",
"children":[
{
"name":"_",
"children":[
]
}
]
}
]
}
]
},
{
"name":"3",
"children":[
]
}
]
}
]
}
]
}
]
}
]
}
]
}
]
}
]
}
]
这里是完整的代码:
#!/usr/bin/python
from collections import defaultdict
import json
def tree():
return defaultdict(tree)
nested = defaultdict(tree)
words = open("single.txt").read().splitlines()
words_output = open("out.json", "wb")
for word in words:
node = nested
for char in word:
node = node[char.lower()]
def convert(d):
return dict((key, convert(value)) for (key,value) in d.iteritems()) if isinstance(d, defaultdict) else d
def format(d):
children = []
for (key, value) in d.iteritems():
children += [{"name":key, "children":format(value)}]
return children
print json.dumps(format(convert(nested)))