在JSON文件中提取部分并创建Pandas数据帧

时间:2017-09-14 12:53:50

标签: json dataframe

我是python的新手,特别是wrt JSON文件。我正在尝试创建一个数据框,并且已经搜索了信息,但我一直在收到错误,而我只是没有找到正确的轨道。

我想将JSON文件的一部分放入数据框中。数据框(仅来自json的3个特定列)应如下所示:

SECS WATTS CAD 0 291 93 1 349 96 2 478 98 3等等。

我将数据存储在1个JSON文件中(称为train.json)。看看下面的样子。

道歉,如果这是一个简单的问题,但我只是没有走上正轨。欢迎任何帮助,这可以让我走上正轨。对不起,如果这个问题的格式不是它应该是什么。我第一次在这里问一个问题。

{
    "RIDE":{
        "STARTTIME":"2017\/09\/09 14:30:32 UTC ",
        "RECINTSECS":1,
        "DEVICETYPE":"SRM PC8 ",
        "IDENTIFIER":" ",
        "OVERRIDES":[
            { "total_distance":{ "value":"108.9" }}
        ],
        "TAGS":{
            "Aerobic TISS":"0 ",
            "Anaerobic TISS":"0 ",
            "Athlete":"Ruud Goorden ",
            "Average Cadence":"0 ",
            "Average Heart Rate":"0 ",
            "Daniels EqP":"0 ",
            "Daniels Points":"0 ",
            "Data":"T--PHC-AG--EV-- ",
            "Device":"SRM PC8 ",
            "Device Info":" "
        },
        "SAMPLES":[
            { "SECS":0, "WATTS":291, "CAD":93, "HR":122, "ALT":-5, "LAT":51.472068788, "LON":3.8169967494, "TEMP":23, "LRBALANCE":45 },
            { "SECS":1, "WATTS":349, "CAD":96, "HR":121, "ALT":-4, "LAT":51.472003912, "LON":3.8171036187, "TEMP":23, "LRBALANCE":44 },
            { "SECS":2, "WATTS":478, "CAD":98, "HR":124, "ALT":-5, "LAT":51.471939036, "LON":3.8172316103, "TEMP":23, "LRBALANCE":44 },
            { "SECS":3, "WATTS":286, "CAD":95, "HR":125, "ALT":-5, "LAT":51.471866617, "LON":3.8173634577, "TEMP":23, "LRBALANCE":45 }
        ]
    }
}

1 个答案:

答案 0 :(得分:0)

pd.read_json()会给你:

                                                         RIDE
DEVICETYPE                                           SRM PC8 
IDENTIFIER                                                   
OVERRIDES            [{'total_distance': {'value': '108.9'}}]
RECINTSECS                                                  1
SAMPLES     [{'SECS': 0, 'WATTS': 291, 'CAD': 93, 'HR': 12...
STARTTIME                            2017/09/09 14:30:32 UTC 
TAGS        {'Aerobic TISS': '0 ', 'Anaerobic TISS': '0 ',...

这是一个包含单个列的DataFrame,每个单元格中都有JSON文档。我们需要打开具有复合值的细胞:

pd.io.json.json_normalize(df.RIDE.OVERRIDES)
pd.io.json.json_normalize(df.RIDE.SAMPLES)
pd.io.json.json_normalize(df.RIDE.TAGS).T # transpose for easy reading

那些给你:

  total_distance.value
0                108.9

---

   ALT  CAD   HR        LAT       LON  LRBALANCE  SECS  TEMP  WATTS
0   -5   93  122  51.472069  3.816997         45     0    23    291
1   -4   96  121  51.472004  3.817104         44     1    23    349
2   -5   98  124  51.471939  3.817232         44     2    23    478
3   -5   95  125  51.471867  3.817363         45     3    23    286

---
                                   0
Aerobic TISS                      0 
Anaerobic TISS                    0 
Athlete                Ruud Goorden 
Average Cadence                   0 
Average Heart Rate                0 
Daniels EqP                       0 
Daniels Points                    0 
Data                T--PHC-AG--EV-- 
Device                      SRM PC8 
Device Info                         

希望你能从这里拿走它。