从?转换为csv?

时间:2014-02-26 19:36:48

标签: python json parsing csv nested

我有一个包含以下行的文件

{"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f508f-e7c8-32b8-e044-0003ba298018","municipalityCode":"0766","municipalityName":"Hedensted","streetCode":"0072","streetName":"Værnegården","streetBuildingIdentifier":"13","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"","postCodeIdentifier":"8000","districtName":"Århus","presentationString":"Værnegården 13, 8000 Århus","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(553564 6179299)","x":553564,"y":6179299}]}

我想将每一行转换为带有标题的csv可读文件。如下所示

status,message,data,addressAccessId,municipalityCode,municipalityName,streetCode,streetName,streetBuildingIdentifier,mailDeliverySublocationIdentifier,districtSubDivisionIdentifier,postCodeIdentifier,districtName,presentationString,addressSpecificCount,validCoordinates,geometryWkt,x,y
OK,OK,data:type,addressAccessType,0a3f508f-e7c8-32b8-e044-0003ba298018,0766,Hedensted,0072,Værnegården,13,,,8000,Århus,Værnegården 13, 8000 Århus,1,true,POINT553564 6179299,553564,6179299

我如何做到这一点?代码和解释非常受欢迎。到目前为止,这是我从这个例子中得出以下内容:(How can I convert JSON to CSV?

x = json.loads(x)

f = csv.writer(open('test.csv', 'wb+'))

# Write CSV Header, If you dont need that, remove this line
f.writerow(['status', 'message', 'type', 'addressAccessId', 'municipalityCode','municipalityName','streetCode','streetName','streetBuildingIdentifier','mailDeliverySublocationIdentifier','districtSubDivisionIdentifier','postCodeIdentifier','districtName','presentationString','addressSpecificCount','validCoordinates','geometryWkt','x','y'])


for x in x:
    f.writerow([x['status'], 
                x['message'], 
                x['data']['type'], 
                x['data']['addressAccessId'],
                x['data']['municipalityCode'],
                x['data']['municipalityName'],
                x['data']['streetCode'],
                x['data']['streetName'],
                x['data']['streetBuildingIdentifier'],
                x['data']['mailDeliverySublocationIdentifier'],
                x['data']['districtSubDivisionIdentifier'],
                x['data']['postCodeIdentifier'],
                x['data']['districtName'],
                x['data']['presentationString'],
                x['data']['addressSpecificCount'],
                x['data']['validCoordinates'],
                x['data']['geometryWkt'],
                x['data']['x'],
                x['data']['y']])

我已经查看并尝试了很多其他解决方案,包括DictWriter,replace()和translate()来删除字符,但还没有能够根据我的需要转换行。目的是能够选择输出到新文件的字段,并将x和y转换为新的坐标系。但是现在我只是试图将上面的行解析为csv文件。任何人都可以提供代码的代码和解释吗?非常感谢您的宝贵时间。

以下是我的addresses.txt的前几行

    {"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f5081-e039-32b8-e044-0003ba298018","municipalityCode":"0265","municipalityName":"Roskilde","streetCode":"0831","streetName":"Brønsager","streetBuildingIdentifier":"69","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"Svogerslev","postCodeIdentifier":"4000","districtName":"Roskilde","presentationString":"Brønsager 69, 4000 Roskilde","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(690026 6169309)","x":690026,"y":6169309}]}
    {"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f5089-ecab-32b8-e044-0003ba298018","municipalityCode":"0461","municipalityName":"Odense","streetCode":"9505","streetName":"Vægtens Kvarter","streetBuildingIdentifier":"271","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"Holluf Pile","postCodeIdentifier":"5220","districtName":"Odense SØ","presentationString":"Vægtens Kvarter 271, 5220 Odense SØ","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(592191 6135829)","x":592191,"y":6135829}]}
    {"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f507c-adc3-32b8-e044-0003ba298018","municipalityCode":"0165","municipalityName":"Albertslund","streetCode":"0445","streetName":"Skyttehusene","streetBuildingIdentifier":"33","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"","postCodeIdentifier":"2620","districtName":"Albertslund","presentationString":"Skyttehusene 33, 2620 Albertslund","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(711079 6174741)","x":711079,"y":6174741}]}
    {"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f509c-7f57-32b8-e044-0003ba298018","municipalityCode":"0851","municipalityName":"Aalborg","streetCode":"5205","streetName":"Løvstikkevej","streetBuildingIdentifier":"36","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"","postCodeIdentifier":"9000","districtName":"Aalborg","presentationString":"Løvstikkevej 36, 9000 Aalborg","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(552407 6322490)","x":552407,"y":6322490}]}
    {"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f5098-32a6-32b8-e044-0003ba298018","municipalityCode":"0779","municipalityName":"Skive","streetCode":"0462","streetName":"Landevejen","streetBuildingIdentifier":"52","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"Håsum","postCodeIdentifier":"7860","districtName":"Spøttrup","presentationString":"Landevejen 52, 7860 Spøttrup","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(491515 6269739)","x":491515,"y":6269739}]}

2 个答案:

答案 0 :(得分:3)

请注意,data键包含字典的列表x['data']['type']不起作用,但x['data'][0]['type'] 。但是,该列表中可能不止一个这样的字典。我假设你想要一个CSV行x['data']字典

接下来,您似乎每行都有一个UTF-8 BOM ;无论写什么,都没有正确使用UTF-8编码。我们需要去掉这个标记,前三个字符。

最后,JSON字符串始终是Unicode数据,并且数据中包含非ASCII字符,因此在将数据传递给CSV writer对象之前,您必须再次编码为字节串。

我在这里使用csv.DictWriter,带有预定义的字段名称列表:

import codecs
import csv
import json

fields = [
    'status', 'message', 'type', 'addressAccessId', 'municipalityCode', 
    'municipalityName', 'streetCode', 'streetName', 'streetBuildingIdentifier',
    'mailDeliverySublocationIdentifier', 'districtSubDivisionIdentifier',
    'postCodeIdentifier', 'districtName', 'presentationString', 'addressSpecificCount',
    'validCoordinates', 'geometryWkt', 'x', 'y']


with open('test.csv', 'wb') as csvfile, open('jsonfile', 'r') as jsonfile:
    writer = csv.DictWriter(csvfile, fields)
    writer.writeheader()

    for line in jsonfile:
        if line.startswith(codecs.BOM_UTF8):
            line = line[3:]
        entry = json.loads(line)
        for item in entry['data']:
            row = dict(item, status=entry['status'], message=entry['message'])
            row = {k.encode('utf8'): unicode(v).encode('utf8') for k, v in row.iteritems()}
            writer.writerow(row)

row字典基本上是entry['data']列表中每个字典的副本,其中statusmessage键分别复制。这使得row成为一个扁平字典。

我还逐行读取您的输入文件,正如您所说的每行包含一个单独的JSON条目。

答案 1 :(得分:0)

使用cvs.DictWriter()打开输出文件,并按照指定定义输出标题字段。使用extrasaction ='ignore'和restval =''作为选项。

请查看Opening A large JSON file in Python with no newlines for csv conversion Python 2.6.6以获取有关处理大型文件的帮助,因为我遇到了类似的问题。另请查看我链接到的问题。

我使用适当的循环从JSON构建类似类型的系统。

例如,

def parse_row(currdata):
  outx = {}
  # currdata is defined earlier to point to the x['data'] dictionary
  for eachx in currdata:
    outx[eachx] = currdata[eachx]
  return outx

这是一个以currdata作为参数的函数,并以x ['data'] [row]作为输入参数调用。

rows = len(x['data'])
for row in range(rows):
  outx = parse_row(x['data'][row])
  # process the row and create output

这应该允许您正确设置解析。我无法将实际代码复制到此答案中,但这应该指向您的解决方案。