熊猫无法从JSON API正确获取数据

时间:2020-06-04 17:15:26

标签: python python-3.x pandas python-2.7 data-science

我正在尝试将数据从JSON API获取到Pandas Dataframe。但是,熊猫无法正确读取数据。下面是我的代码和输出:

import pandas as pd
import requests
r = requests.get('https://api.covid19india.org/raw_data5.json')
j = r.json()
df = pd.DataFrame.from_dict(j)

但是,我得到的输出不正确

raw_data
0   {'agebracket': '', 'contractedfromwhichpatient...
1   {'agebracket': '', 'contractedfromwhichpatient...
2   {'agebracket': '', 'contractedfromwhichpatient...
3   {'agebracket': '', 'contractedfromwhichpatient...
4   {'agebracket': '', 'contractedfromwhichpatient...

运行df.info()时,我得到:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20409 entries, 0 to 20408
Data columns (total 1 columns):
raw_data    20409 non-null object
dtypes: object(1)
memory usage: 159.5+ KB

有人可以帮我这个忙吗?

2 个答案:

答案 0 :(得分:0)

使用j = r.json()['raw_data']从json中选择raw_data密钥。

df.info()

输出:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20409 entries, 0 to 20408
Data columns (total 20 columns):
 #   Column                               Non-Null Count  Dtype 
---  ------                               --------------  ----- 
 0   agebracket                           20409 non-null  object
 1   contractedfromwhichpatientsuspected  20409 non-null  object
 2   currentstatus                        20409 non-null  object
 3   dateannounced                        20409 non-null  object
 4   detectedcity                         20409 non-null  object
 5   detecteddistrict                     20409 non-null  object
 6   detectedstate                        20409 non-null  object
 7   entryid                              20409 non-null  object
 8   gender                               20409 non-null  object
 9   nationality                          20409 non-null  object
 10  notes                                20409 non-null  object
 11  numcases                             20409 non-null  object
 12  patientnumber                        20409 non-null  object
 13  source1                              20409 non-null  object
 14  source2                              20409 non-null  object
 15  source3                              20409 non-null  object
 16  statecode                            20409 non-null  object
 17  statepatientnumber                   20409 non-null  object
 18  statuschangedate                     20409 non-null  object
 19  typeoftransmission                   20409 non-null  object
dtypes: object(20)
memory usage: 3.1+ MB

答案 1 :(得分:0)

请尝试:

df = df['raw_data'].apply(pd.Series)
df.info()

输出

 <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 20409 entries, 0 to 20408
    Data columns (total 20 columns):
    agebracket                             20409 non-null object
    contractedfromwhichpatientsuspected    20409 non-null object
    currentstatus                          20409 non-null object
    dateannounced                          20409 non-null object
    detectedcity                           20409 non-null object
    detecteddistrict                       20409 non-null object
    detectedstate                          20409 non-null object
    entryid                                20409 non-null object
    gender                                 20409 non-null object
    nationality                            20409 non-null object
    notes                                  20409 non-null object
    numcases                               20409 non-null object
    patientnumber                          20409 non-null object
    source1                                20409 non-null object
    source2                                20409 non-null object
    source3                                20409 non-null object
    statecode                              20409 non-null object
    statepatientnumber                     20409 non-null object
    statuschangedate                       20409 non-null object
    typeoftransmission                     20409 non-null object
    dtypes: object(20)
    memory usage: 3.1+ MB