如何读取从API检索到的JSON并将其保存到CSV文件中?

时间:2019-04-02 09:51:11

标签: python json csv

我正在使用以JSON文件作为响应的天气Api。这是返回的读数的一个示例:

{
  'data': {
    'request': [{
      'type': 'City',
      'query': 'Karachi, Pakistan'
    }],
    'weather': [{
      'date': '2019-03-10',
      'astronomy': [{
        'sunrise': '06:46 AM',
        'sunset': '06:38 PM',
        'moonrise': '09:04 AM',
        'moonset': '09:53 PM',
        'moon_phase': 'Waxing Crescent',
        'moon_illumination': '24'
      }],
      'maxtempC': '27',
      'maxtempF': '80',
      'mintempC': '22',
      'mintempF': '72',
      'totalSnow_cm': '0.0',
      'sunHour': '11.6',
      'uvIndex': '7',
      'hourly': [{
        'time': '24',
        'tempC': '27',
        'tempF': '80',
        'windspeedMiles': '10',
        'windspeedKmph': '16',
        'winddirDegree': '234',
        'winddir16Point': 'SW',
        'weatherCode': '116',
        'weatherIconUrl': [{
          'value': 'http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0002_sunny_intervals.png'
        }],
        'weatherDesc': [{
          'value': 'Partly cloudy'
        }],
        'precipMM': '0.0',
        'humidity': '57',
        'visibility': '10',
        'pressure': '1012',
        'cloudcover': '13',
        'HeatIndexC': '25',
        'HeatIndexF': '78',
        'DewPointC': '15',
        'DewPointF': '59',
        'WindChillC': '24',
        'WindChillF': '75',
        'WindGustMiles': '12',
        'WindGustKmph': '19',
        'FeelsLikeC': '25',
        'FeelsLikeF': '78',
        'uvIndex': '0'
      }]
    }]
  }
}

我尝试使用以下Python代码读取JSON文件中存储的数据:

import simplejson as json 
data_file = open("new.json", "r") 
values = json.load(data_file)

但这会输出错误,如下所示:

JSONDecodeError: Expecting value: line 1 column 1 (char 0) error

我还想知道如何使用Python将结果以结构化格式保存在CSV文件中。

1 个答案:

答案 0 :(得分:-1)

如下面Rami所述,执行此操作的最简单方法是使用熊猫来a).read_json()或使用pd.DataFrame.from_dict()。但是,这种情况下的问题是您嵌套了dictionary / json。我是什么意思嵌套呢?好吧,如果您只是将其放入数据框,那么您将拥有:

print (df)
                                          request                                            weather
0  {'type': 'City', 'query': 'Karachi, Pakistan'}  {'date': '2019-03-10', 'astronomy': [{'sunrise...

如果那是您想要的,哪个很好。但是,我假设您希望将所有数据/实例展平为单一行。

因此,您将需要使用json_normalize来解散它(可以,但是您需要确定json文件始终遵循相同的格式/键。而且,您仍然需要拉出列表中每个词典中的每个词典。另一个选择是使用一些函数来展平嵌套的json,然后从那里可以简单地写入文件:

我选择使用函数将其展平,然后构造数据框:

import pandas as pd
import json
import re
from pandas.io.json import json_normalize


data = {'data': {'request': [{'type': 'City', 'query': 'Karachi, Pakistan'}], 'weather': [{'date': '2019-03-10', 'astronomy': [{'sunrise': '06:46 AM', 'sunset': '06:38 PM', 'moonrise': '09:04 AM', 'moonset': '09:53 PM', 'moon_phase': 'Waxing Crescent', 'moon_illumination': '24'}], 'maxtempC': '27', 'maxtempF': '80', 'mintempC': '22', 'mintempF': '72', 'totalSnow_cm': '0.0', 'sunHour': '11.6', 'uvIndex': '7', 'hourly': [{'time': '24', 'tempC': '27', 'tempF': '80', 'windspeedMiles': '10', 'windspeedKmph': '16', 'winddirDegree': '234', 'winddir16Point': 'SW', 'weatherCode': '116', 'weatherIconUrl': [{'value': 'http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0002_sunny_intervals.png'}], 'weatherDesc': [{'value': 'Partly cloudy'}], 'precipMM': '0.0', 'humidity': '57', 'visibility': '10', 'pressure': '1012', 'cloudcover': '13', 'HeatIndexC': '25', 'HeatIndexF': '78', 'DewPointC': '15', 'DewPointF': '59', 'WindChillC': '24', 'WindChillF': '75', 'WindGustMiles': '12', 'WindGustKmph': '19', 'FeelsLikeC': '25', 'FeelsLikeF': '78', 'uvIndex': '0'}]}]}}

def flatten_json(y):
    out = {}
    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x
    flatten(y)
    return out


flat = flatten_json(data['data'])


results = pd.DataFrame()
special_cols = []

columns_list = list(flat.keys())
for item in columns_list:
    try:
        row_idx = re.findall(r'\_(\d+)\_', item )[0]
    except:
        special_cols.append(item)
        continue
    column = re.findall(r'\_\d+\_(.*)', item )[0]
    column = column.replace('_', '')

    row_idx = int(row_idx)
    value = flat[item]

    results.loc[row_idx, column] = value

for item in special_cols:
    results[item] = flat[item]

results.to_csv('path/filename.csv', index=False)

输出:

print (results.to_string())
   type              query        date astronomy0sunrise astronomy0sunset astronomy0moonrise astronomy0moonset astronomy0moonphase astronomy0moonillumination maxtempC maxtempF mintempC mintempF totalSnowcm sunHour uvIndex hourly0time hourly0tempC hourly0tempF hourly0windspeedMiles hourly0windspeedKmph hourly0winddirDegree hourly0winddir16Point hourly0weatherCode                        hourly0weatherIconUrl0value hourly0weatherDesc0value hourly0precipMM hourly0humidity hourly0visibility hourly0pressure hourly0cloudcover hourly0HeatIndexC hourly0HeatIndexF hourly0DewPointC hourly0DewPointF hourly0WindChillC hourly0WindChillF hourly0WindGustMiles hourly0WindGustKmph hourly0FeelsLikeC hourly0FeelsLikeF hourly0uvIndex
0  City  Karachi, Pakistan  2019-03-10          06:46 AM         06:38 PM           09:04 AM          09:53 PM     Waxing Crescent                         24       27       80       22       72         0.0    11.6       7          24           27           80                    10                   16                  234                    SW                116  http://cdn.worldweatheronline.net/images/wsymb...            Partly cloudy             0.0              57                10            1012                13                25                78               15               59                24                75                   12                  19                25                78              0