如何在python中将json转换为pd.dataframe?

时间:2019-10-04 21:20:30

标签: json python-3.x pandas

我有一个需要转换为Pandas DataFrame的json文件。

json:

amp

我尝试了下面的代码,但是都返回了:{'@odata.context': 'http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule', 'days': ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'], 'times': ['00:30'], 'enabled': False, 'localTimeZoneId': 'UTC', 'notifyOption': 'MailOnFailure'}

1)

ValueError: arrays must all be same length

2)也尝试使用“”作为类似stackoverflow问题中的建议

test_df = pd.DataFrame(
                    {'@odata.context': 'http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule', 
                     'days': ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'], 
                     'times': ['00:30'], 
                     'enabled': False, 
                     'localTimeZoneId': 'UTC', 
                     'notifyOption': 'MailOnFailure'})

3)

test_df = pd.DataFrame(
                    {"@odata.context": "http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule", 
                     "days": ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"], 
                     "times": ["00:30"], 
                     "enabled": False, 
                     "localTimeZoneId": "UTC", 
                     "notifyOption": "MailOnFailure"})

1 个答案:

答案 0 :(得分:0)

这完全取决于文件:

  • 第一个问题是从文件中获取数据,这取决于文件的格式
  • 所使用的示例数据是问题顶部的单引号数据的五个重复行。

如果文件是一堆dicts,则每个文件都放在换行符上:

{'@odata.context': 'http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule', 'days': ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'], 'times': ['00:30'], 'enabled': False, 'localTimeZoneId': 'UTC', 'notifyOption': 'MailOnFailure'}
{...}
{...}
{...}
{...}

创建数据框的代码:

  • encoding="utf8"如果不需要,可以删除
  • pandas.io.json.json_normalize用于将半结构化JSON数据标准化为平面表。
    • 这意味着它将拼合嵌套列表。
import pandas as pd
from pandas.io.json import json_normalize
from ast import literal_eval


line_list = list()
with open("test.json", encoding="utf8") as f:
    for line in f:
        line = literal_eval(line)
        line_list.append(line)

df = json_normalize(line_list, ['days'], ['@odata.context', 'enabled', 'localTimeZoneId', 'notifyOption', 'times'],)

如果文件是list中的dicts

[{...},
 {...},
 {...},
 {...},
 {...}]

创建数据框的代码:

with open("test.json", encoding="utf8") as f:
    data = literal_eval(f.read())

df = json_normalize(data, ['days'], ['@odata.context', 'enabled', 'localTimeZoneId', 'notifyOption', 'times'])

数据帧输出:

         0                                                                                                      @odata.context enabled localTimeZoneId   notifyOption  times
    Sunday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
    Monday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
   Tuesday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
 Wednesday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
  Thursday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
    Friday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
  Saturday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
    Sunday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
    Monday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
   Tuesday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
 Wednesday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
  Thursday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
    Friday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30
  Saturday  http://analysis.windows.net/v1.0/myorg/groups//$metadata#Microsoft.PowerBI.ServiceContracts.Api.V1.RefreshSchedule   False             UTC  MailOnFailure  00:30