如何将多层嵌套json转换为sql表

时间:2016-11-07 01:07:15

标签: python mysql json pandas

在StackOverflow的帮助下,我迄今为止得到了这个。需要更多帮助将JSON转换为SQL表。任何帮助都非常感谢。

{
    "Volumes": [{
        "AvailabilityZone": "us-east-1a",
        "Attachments": [{
            "AttachTime": "2013-12-18T22:35:00.000Z",
            "InstanceId": "i-1234567890abcdef0",
            "VolumeId": "vol-049df61146c4d7901",
            "State": "attached",
            "DeleteOnTermination": true,
            "Device": "/dev/sda1",

            "Tags": [{
                "Value": "DBJanitor-Private",
                "Key": "Name"
            }, {
                "Value": "DBJanitor",
                "Key": "Owner"
            }, {
                "Value": "Database",
                "Key": "Product"
            }, {
                "Value": "DB Janitor",
                "Key": "Portfolio"
            }, {
                "Value": "DB Service",
                "Key": "Service"
            }]
        }],
            "Ebs": {
                                "Status": "attached",
                                "DeleteOnTermination": true,
                                "VolumeId": "vol-049df61146c4d7901",
                                "AttachTime": "2016-09-14T19:49:11.000Z"
                            },
        "VolumeType": "standard",
        "VolumeId": "vol-049df61146c4d7901"
    }]
}

借助StackOverFlow的帮助,我能够解决直到标签。无法弄清楚如何解决Ebs片。我对编码很陌生,任何帮助都是深刻的。

In [1]: fn = r'D:\temp\.data\40454898.json'

In [2]: with open(fn) as f:
   ...:     data = json.load(f)
   ...:

In [14]: t = pd.io.json.json_normalize(data['Volumes'],
    ...:                               ['Attachments','Tags'],
    ...:                               [['Attachments', 'VolumeId'],
    ...:                                ['Attachments', 'InstanceId']])
    ...:

In [15]: t
Out[15]:
         Key              Value Attachments.InstanceId   Attachments.VolumeId
0       Name  DBJanitor-Private    i-1234567890abcdef0  vol-049df61146c4d7901
1      Owner          DBJanitor    i-1234567890abcdef0  vol-049df61146c4d7901
2    Product           Database    i-1234567890abcdef0  vol-049df61146c4d7901
3  Portfolio         DB Janitor    i-1234567890abcdef0  vol-049df61146c4d7901
4    Service         DB Service    i-1234567890abcdef0  vol-049df61146c4d7901

由于

1 个答案:

答案 0 :(得分:2)

json_normalize需要列表字典,如果是Ebs - 它只是一个字典,所以我们应该预处理JSON数据:

In [88]: with open(fn) as f:
    ...:     data = json.load(f)
    ...:

In [89]: for r in data['Volumes']:
    ...:     if 'Ebs' not in r: # add 'Ebs' dict if it's not in the record...
    ...:         r['Ebs'] = []
    ...:     if not isinstance(r['Ebs'], list): # wrap 'Ebs' in a list if it's not a list 
    ...:         r['Ebs'] = [r['Ebs']]
    ...:

In [90]: data
Out[90]:
{'Volumes': [{'Attachments': [{'AttachTime': '2013-12-18T22:35:00.000Z',
     'DeleteOnTermination': True,
     'Device': '/dev/sda1',
     'InstanceId': 'i-1234567890abcdef0',
     'State': 'attached',
     'Tags': [{'Key': 'Name', 'Value': 'DBJanitor-Private'},
      {'Key': 'Owner', 'Value': 'DBJanitor'},
      {'Key': 'Product', 'Value': 'Database'},
      {'Key': 'Portfolio', 'Value': 'DB Janitor'},
      {'Key': 'Service', 'Value': 'DB Service'}],
     'VolumeId': 'vol-049df61146c4d7901'}],
   'AvailabilityZone': 'us-east-1a',
   'Ebs': [{'AttachTime': '2016-09-14T19:49:11.000Z',
     'DeleteOnTermination': True,
     'Status': 'attached',
     'VolumeId': 'vol-049df61146c4d7901'}],
   'VolumeId': 'vol-049df61146c4d7901',
   'VolumeType': 'standard'}]}

注意:'Ebs': {..}已替换为'Ebs': [{..}]

In [91]: e = pd.io.json.json_normalize(data['Volumes'],
    ...:                               ['Ebs'],
    ...:                               ['VolumeId'],
    ...:                               meta_prefix='parent_')
    ...:


In [92]: e
Out[92]:
                 AttachTime DeleteOnTermination    Status               VolumeId        parent_VolumeId
0  2016-09-14T19:49:11.000Z                True  attached  vol-049df61146c4d7901  vol-049df61146c4d7901