我有一个csv列如下,现在我正在尝试将其转换为名称/子/大小格式,如DSON中的D3所示。例如,有重复的儿童存在 在名称="键入"有孩子="年轻",大小= 400000
L1 L2 L3 L4 L5 L6 Size
Type cars young young young young 40000
Type cars student US US US 10000
Type cars student UK UK UK 20000
Type cars Graduates Young India Delhi 20000
Type cars Graduates Old UK London 30000
Type Bike Undergrads CB CB UNC 6000
prime prime prime prime prime prime 600
我得到的输出是:
{
"name": "Segments",
"children": [
{
"name": "Type",
"children": [
{
"name": "cars",
"children": [
{
"name": "young",
"children": [
{
"name": "young",
"children": [
{
"name": "young",
"children": [
{
"name": "young",
"size": "40000"
}
]
}
]
}
]
},
{
"name": "student",
"children": [
{
"name": "US",
"children": [
{
"name": "US",
"children": [
{
"name": "US",
"size": "10000"
}
]
}
]
},
{
"name": "UK",
"children": [
{
"name": "UK",
"children": [
{
"name": "UK",
"size": "20000"
}
]
}
]
}
]
}
]
}
]
},
{
"name": "prime",
"children": [
{
"name": "prime",
"children": [
{
"name": "prime",
"children": [
{
"name": "prime",
"children": [
{
"name": "prime",
"children": [
{
"name": "prime",
"size": "600"
}
]
}
]
}
]
}
]
}
]
}
]
}
期望输出是:
{
"name": "Segments",
"children": [
{
"name": "Type",
"children": [
{
"name": "cars",
"children": [
{
"name": "young",
"size": "40000"
}
]
},
{
"name": "student",
"children": [
{
"name": "US",
"size": "10000"
}
{
"name": "UK",
"size": "20000"
}
]
}
]
},
{
"name": "prime",
"size": "600"
}
]
}
我正在使用以下代码:
import json
import csv
class Node(object):
def __init__(self, name, size=None):
self.name = name
self.children = []
self.size = size
def child(self, cname, size=None):
child_found = [c for c in self.children if c.name == cname]
if not child_found:
_child = Node(cname, size)
self.children.append(_child)
else:
_child = child_found[0]
return _child
def as_dict(self):
res = {'name': self.name}
if self.size is None:
res['children'] = [c.as_dict() for c in self.children]
else:
res['size'] = self.size
return res
root = Node('Segments')
with open('C:\\Users\\G01172472\\Desktop\\Book3.csv', 'r') as f:
reader = csv.reader(f)
p = list(reader)
for row in range(1, len(p)):
grp1, grp2, grp3, grp4, grp5, grp6, size = p[row]
root.child(grp1).child(grp2).child(grp3).child(grp4).child(grp5).child(grp6, size)
print(json.dumps(root.as_dict(), indent=4))
答案 0 :(得分:1)
因此,您首先想要从每行中删除重复项并相应地创建子项。
这是我改变的内容:
with open('C:\\Users\\G01172472\\Desktop\\Book3.csv', 'r') as f:
reader = csv.reader(f)
p = list(reader)
for row in range(1, len(p)):
temp = []
for x in p[row]:
if x not in temp:
temp.append(x)
#Create a temporary list of the row but keep only unique elements
## Additional code according to your dictionary structure
#if row != 1:
# if 'cars' in temp:
# temp.remove('cars')
# elif 'Bike' in temp:
# temp.remove('Bike')
# Create a string to which will look similar to root.child(grp1)...
evalStr = 'root'
for i in range(len(temp)):
if i == len(temp)-2:
evalStr += '.child("' + temp[i] + '","' + temp[-1] + '")'
else:
evalStr += '.child("' + temp[i] + '")'
# eval(string) will evaluate the string as python code
eval(evalStr)
print(json.dumps(root.as_dict(),indent=2))
如果有效,请告诉我。
答案 1 :(得分:1)
首先,您需要从行中删除重复项。这可以按照以下方式完成,
p[row] = ('Type', 'cars', 'young', 'young', 'young', 'young', 'Size')
pp = set()
new_p_row = [el for el in p[row] if not (el in pp or pp.add(el))]
# ['Type', 'cars', 'young', 'Size']
然后将孩子添加到你的根,直到最后两个,
for r in new_p_row[:-2]:
root.child(r)
将最后一个子项添加到您的根目录,
root.child(new_p_row[-2], new_p_row[-1])