熊猫读取不带标题的JSON文件

时间:2019-08-30 09:35:16

标签: python pandas

我很想知道如何将这个JSON文件读入Pandas数据帧并设置新的标头,因为我的源没有任何标头。我试图获取日期,街道,郊区作为标题。

作为一个例子。 肯特街是郊区, Karawara 是郊区

{
    "25 March 2019": {
        "Albany Highway": ["Maddington", "Cannington"],
        "Kent Street": ["Karawara"],
        "Kitchener Road": ["Alfred Cove"],
        "Alexander Road": ["Rivervale"],
        "Kwinana Freeway": ["Wellard"],
    },
    "26 March 2019": {
        "Great Eastern Highway": ["Sawyers Valley", "Redcliffe"],
        "South Western Highway": ["Armadale", "Wungong"],
        "Great Northern Highway": ["Muchea", "Baskerville"],
        "St Thomas Primary": ["Claremont"],
        "Stirling Highway": ["Claremont"],
        "Grovelands Primary": ["Camillo"],
        "Swan View Senior High": ["Swan View"],
    }
}

期望输出类似于;

{
    {
        "date": "25 March 2019",
        "street": "Kent Street"
        "suburb": "Karawara"
    }, {
        "date": "26 March 2019",
        "street": "St Thomas Primary"
        "suburb": "Claremont"
    }
}

规则 第一值始终是街道第二值为郊区。有些情况下有两个郊区。从概念上讲,我们将有两排,但如果不是我的,则将其保留为一排。

我发现了类似Pandas read nested json之类的问题,但是找不到任何示例,其中json文件仅具有零标头。

1 个答案:

答案 0 :(得分:3)

如果我正确理解,您需要以下内容:

首先,读取Json文件并将其转换为Dictionary

import json

 with open('<yourFile>.json', 'r') as JSON:
        json_dict = json.load(JSON)

然后,我想你有这个:

x={
    "25 March 2019": {
        "Albany Highway": ["Maddington", "Cannington"],
        "Kent Street": ["Karawara"],
        "Kitchener Road": ["Alfred Cove"],
        "Alexander Road": ["Rivervale"],
        "Kwinana Freeway": ["Wellard"],
    },
    "26 March 2019": {
        "Great Eastern Highway": ["Sawyers Valley", "Redcliffe"],
        "South Western Highway": ["Armadale", "Wungong"],
        "Great Northern Highway": ["Muchea", "Baskerville"],
        "St Thomas Primary": ["Claremont"],
        "Stirling Highway": ["Claremont"],
        "Grovelands Primary": ["Camillo"],
        "Swan View Senior High": ["Swan View"],
    }
}

您可以这样做:

df=pd.DataFrame([(j,z,h) for i in x.values() for j in x.keys() for h,z in i.items()],columns=['Date','suburb','street'])

print(df)

             Date                       suburb                  street
0   25 March 2019     [Maddington, Cannington]          Albany Highway
1   25 March 2019                   [Karawara]             Kent Street
2   25 March 2019                [Alfred Cove]          Kitchener Road
3   25 March 2019                  [Rivervale]          Alexander Road
4   25 March 2019                    [Wellard]         Kwinana Freeway
5   26 March 2019     [Maddington, Cannington]          Albany Highway
6   26 March 2019                   [Karawara]             Kent Street
7   26 March 2019                [Alfred Cove]          Kitchener Road
8   26 March 2019                  [Rivervale]          Alexander Road
9   26 March 2019                    [Wellard]         Kwinana Freeway
10  25 March 2019  [Sawyers Valley, Redcliffe]   Great Eastern Highway
11  25 March 2019          [Armadale, Wungong]   South Western Highway
12  25 March 2019        [Muchea, Baskerville]  Great Northern Highway
13  25 March 2019                  [Claremont]       St Thomas Primary
14  25 March 2019                  [Claremont]        Stirling Highway
15  25 March 2019                    [Camillo]      Grovelands Primary
16  25 March 2019                  [Swan View]   Swan View Senior High
17  26 March 2019  [Sawyers Valley, Redcliffe]   Great Eastern Highway
18  26 March 2019          [Armadale, Wungong]   South Western Highway
19  26 March 2019        [Muchea, Baskerville]  Great Northern Highway
20  26 March 2019                  [Claremont]       St Thomas Primary
21  26 March 2019                  [Claremont]        Stirling Highway
22  26 March 2019                    [Camillo]      Grovelands Primary
23  26 March 2019                  [Swan View]   Swan View Senior High

或者,您可以这样做:

dic=[{'date':j,'street':z,'suburb':h} for i in x.values() for j in x.keys() for h,z in i.items()]

dic

[{'date': '25 March 2019',
  'street': ['Maddington', 'Cannington'],
  'suburb': 'Albany Highway'},
 {'date': '25 March 2019', 'street': ['Karawara'], 'suburb': 'Kent Street'},
 {'date': '25 March 2019',
  'street': ['Alfred Cove'],
  'suburb': 'Kitchener Road'},
 {'date': '25 March 2019',
  'street': ['Rivervale'],
  'suburb': 'Alexander Road'},
 {'date': '25 March 2019', 'street': ['Wellard'], 'suburb': 'Kwinana Freeway'},
 {'date': '26 March 2019',
  'street': ['Maddington', 'Cannington'],
  'suburb': 'Albany Highway'},
 {'date': '26 March 2019', 'street': ['Karawara'], 'suburb': 'Kent Street'},
 {'date': '26 March 2019',
  'street': ['Alfred Cove'],
  'suburb': 'Kitchener Road'},
 {'date': '26 March 2019',
  'street': ['Rivervale'],
  'suburb': 'Alexander Road'}

...

作为字典列表。现在,您可以像这样将其转换为数据框:

df=pd.DataFrame(d)

             date                       street                  suburb
0   25 March 2019     [Maddington, Cannington]          Albany Highway
1   25 March 2019                   [Karawara]             Kent Street
2   25 March 2019                [Alfred Cove]          Kitchener Road
3   25 March 2019                  [Rivervale]          Alexander Road
4   25 March 2019                    [Wellard]         Kwinana Freeway
5   26 March 2019     [Maddington, Cannington]          Albany Highway
6   26 March 2019                   [Karawara]             Kent Street
7   26 March 2019                [Alfred Cove]          Kitchener Road
8   26 March 2019                  [Rivervale]          Alexander Road
9   26 March 2019                    [Wellard]         Kwinana Freeway
10  25 March 2019  [Sawyers Valley, Redcliffe]   Great Eastern Highway
11  25 March 2019          [Armadale, Wungong]   South Western Highway
12  25 March 2019        [Muchea, Baskerville]  Great Northern Highway
13  25 March 2019                  [Claremont]       St Thomas Primary
14  25 March 2019                  [Claremont]        Stirling Highway
15  25 March 2019                    [Camillo]      Grovelands Primary
16  25 March 2019                  [Swan View]   Swan View Senior High
17  26 March 2019  [Sawyers Valley, Redcliffe]   Great Eastern Highway
18  26 March 2019          [Armadale, Wungong]   South Western Highway
19  26 March 2019        [Muchea, Baskerville]  Great Northern Highway
20  26 March 2019                  [Claremont]       St Thomas Primary
21  26 March 2019                  [Claremont]        Stirling Highway
22  26 March 2019                    [Camillo]      Grovelands Primary
23  26 March 2019                  [Swan View]   Swan View Senior High