我正在努力解决这个问题。我有一个JSON文件,并且需要将它发送到CSV,如果结构是扁平的,没有深层嵌套项,那就很好。
但在这种情况下,嵌套的RACES
让我感到困惑。
我如何以这样的格式获取数据:
VENUE, COUNTRY, ITW, RACES__NO, RACES__TIME
对象中的每个对象和每个种族?
{
"1": {
"VENUE": "JOEBURG",
"COUNTRY": "HAE",
"ITW": "XAD",
"RACES": {
"1": {
"NO": 1,
"TIME": "12:35"
},
"2": {
"NO": 2,
"TIME": "13:10"
},
"3": {
"NO": 3,
"TIME": "13:40"
},
"4": {
"NO": 4,
"TIME": "14:10"
},
"5": {
"NO": 5,
"TIME": "14:55"
},
"6": {
"NO": 6,
"TIME": "15:30"
},
"7": {
"NO": 7,
"TIME": "16:05"
},
"8": {
"NO": 8,
"TIME": "16:40"
}
}
},
"2": {
"VENUE": "FOOBURG",
"COUNTRY": "ABA",
"ITW": "XAD",
"RACES": {
"1": {
"NO": 1,
"TIME": "12:35"
},
"2": {
"NO": 2,
"TIME": "13:10"
},
"3": {
"NO": 3,
"TIME": "13:40"
},
"4": {
"NO": 4,
"TIME": "14:10"
},
"5": {
"NO": 5,
"TIME": "14:55"
},
"6": {
"NO": 6,
"TIME": "15:30"
},
"7": {
"NO": 7,
"TIME": "16:05"
},
"8": {
"NO": 8,
"TIME": "16:40"
}
}
}, ...
}
我想将此输出为CSV:
VENUE, COUNTRY, ITW, RACES__NO, RACES__TIME
JOEBERG, HAE, XAD, 1, 12:35
JOEBERG, HAE, XAD, 2, 13:10
JOEBERG, HAE, XAD, 3, 13:40
...
...
FOOBURG, ABA, XAD, 1, 12:35
FOOBURG, ABA, XAD, 2, 13:10
所以首先我得到正确的密钥:
self.keys = self.data.keys()
keys = ["DATA_KEY"]
for key in self.keys:
if type(self.data[key]) == dict:
for k in self.data[key].keys():
if k not in keys:
if type(self.data[key][k]) == unicode:
keys.append(k)
elif type(self.data[key][k]) == dict:
self.subkey = k
for sk in self.data[key][k].values():
for subkey in sk.keys():
subkey = "%s__%s" % (self.subkey, subkey)
if subkey not in keys:
keys.append(subkey)
然后添加数据:
但是怎么样?
对于熟练的forloopers来说,这应该是一个有趣的。 ;-)
答案 0 :(得分:3)
我只为第一个对象收集密钥,然后假设格式的其余部分是一致的。
以下代码还将嵌套对象限制为 one ;你没有具体说明当有多个时应该发生什么。有两个或多个相同长度的嵌套结构可以工作(你可以将它们拉在一起),但是如果你有不同长度的结构,你需要明确选择如何处理它们; zip用空列填充,或写出这些条目的产品(A x B行,每次找到B条目时重复A的信息)。
import csv
from operator import itemgetter
with open(outputfile, 'wb') as outf:
writer = None # will be set to a csv.DictWriter later
for key, item in sorted(data.items(), key=itemgetter(0)):
row = {}
nested_name, nested_items = '', {}
for k, v in item.items():
if not isinstance(v, dict):
row[k] = v
else:
assert not nested_items, 'Only one nested structure is supported'
nested_name, nested_items = k, v
if writer is None:
# build fields for each first key of each nested item first
fields = sorted(row)
# sorted keys of first item in key sorted order
nested_keys = sorted(sorted(nested_items.items(), key=itemgetter(0))[0][1])
fields.extend('__'.join((nested_name, k)) for k in nested_keys)
writer = csv.DictWriter(outf, fields)
writer.writeheader()
for nkey, nitem in sorted(nested_items.items(), key=itemgetter(0)):
row.update(('__'.join((nested_name, k)), v) for k, v in nitem.items())
writer.writerow(row)
对于您的样本输入,这会产生:
COUNTRY,ITW,VENUE,RACES__NO,RACES__TIME
HAE,XAD,JOEBURG,1,12:35
HAE,XAD,JOEBURG,2,13:10
HAE,XAD,JOEBURG,3,13:40
HAE,XAD,JOEBURG,4,14:10
HAE,XAD,JOEBURG,5,14:55
HAE,XAD,JOEBURG,6,15:30
HAE,XAD,JOEBURG,7,16:05
HAE,XAD,JOEBURG,8,16:40
ABA,XAD,FOOBURG,1,12:35
ABA,XAD,FOOBURG,2,13:10
ABA,XAD,FOOBURG,3,13:40
ABA,XAD,FOOBURG,4,14:10
ABA,XAD,FOOBURG,5,14:55
ABA,XAD,FOOBURG,6,15:30
ABA,XAD,FOOBURG,7,16:05
ABA,XAD,FOOBURG,8,16:40