我有一个包含以下内容的CSV文件:
District,Zone,Geographical Region,Development Region,Causalities,In Number
Sindhupalchok,Bagmati,Mountain,Central,Total No. of Houses,66688
Sindhupalchok,Bagmati,Mountain,Central,Total Population,287798
Sindhupalchok,Bagmati,Mountain,Central,Dead Male,1497
Sindhupalchok,Bagmati,Mountain,Central,Dead Female,1943
Kathmandu,Bagmati,Hill,Central,Total No. of Houses,436344
Kathmandu,Bagmati,Hill,Central,Total Population,1744240
Kathmandu,Bagmati,Hill,Central,Dead Male,621
Kathmandu,Bagmati,Hill,Central,Dead Female,600
我的目标是从中生成这样的JSON对象:
{
"district":{
"Sindhupalchok":{
"Causalities":{
"Total No. of Houses":66688,
"Total Population":287798,
"Dead Male":1497,
"Dead Female":1943
},
"geoInfo":{
"Zone":"Bagmati",
"geography":"Mountain",
"Dev Region":"Central"
}
},
"Kathmandu":{
"Causalities":{
"Total No. of Houses":436344,
"Total Population":1744240,
"Dead Male":621,
"Dead Female":600
},
"geoInfo":{
"Zone":"Bagmati",
"geography":"Hill",
"Dev Region":"Central"
}
}
}
}
我尝试过使用csv.DictReader(csvfile,fieldnames),但它会在JSON中生成冗余节点,这些节点难以解析且不必要的长度。
我正在使用python 2.x. 这是我到目前为止的尝试:
>>> csvData = open('data.csv','rb')
>>> fieldnames = ("district", "zone", "geographicalRegion", "developmentRegion", "causalities", "injuredNumber")
>>> reader = csv.DictReader(csvData, fieldnames)
>>> rawJson = json.dumps([ row for row in reader ])
rawJson不是我一直在追求的人。它只是将字段名称与各个数据集进行映射。
所以问题是:如何在没有冗余节点的情况下创建这个JSON对象?
答案 0 :(得分:4)
正如glibdud在评论中提到的那样,您需要手动循环数据,以便您可以创建所需的JSON结构。
我们将{CSV}数据的每一行都作为dict
阅读,并检查我们是否遇到过新区,如果是,我们会为data
dict
创建一个新区域它,并将geoInfo
dict
插入data
。然后我们可以收集该线路的伤亡数据以及该区域的后续线路。一旦我们收集了所有数据,我们就可以将data
dict
插入主all_data
dict
。
为了测试代码,我将.csv数据放入名为' qdata.csv'
的文件中import csv
import json
filename = 'qdata.csv'
fieldnames = ('district', 'Zone', 'geography',
'Dev Region', 'casualties', 'injured')
geo_keys = ('Zone', 'geography', 'Dev Region')
all_data = {}
with open(filename, 'rb') as csvfile:
reader = csv.DictReader(csvfile, fieldnames)
# skip header
next(reader)
current_district = None
for row in reader:
district = row['district']
if district != current_district:
current_district = district
data = all_data[district] = {}
casualties = data['Casualties'] = {}
data['geoInfo'] = dict((k, row[k]) for k in geo_keys)
casualties[row['casualties']] = row['injured']
print json.dumps(all_data, indent=4, sort_keys=True)
{
"Kathmandu": {
"Casualties": {
"Dead Female": "600",
"Dead Male": "621",
"Total No. of Houses": "436344",
"Total Population": "1744240"
},
"geoInfo": {
"Dev Region": "Central",
"Zone": "Bagmati",
"geography": "Hill"
}
},
"Sindhupalchok": {
"Casualties": {
"Dead Female": "1943",
"Dead Male": "1497",
"Total No. of Houses": "66688",
"Total Population": "287798"
},
"geoInfo": {
"Dev Region": "Central",
"Zone": "Bagmati",
"geography": "Mountain"
}
}
}
此输出不是完全您在问题中得到的内容,但我认为您应该可以从此处获取。 :)