使用嵌套字典展平数据框

时间:2020-08-09 14:34:44

标签: python pandas dictionary nested

     [
      {
        "match_hometeam_score": "2 ",
        "match_awayteam_score": " 0",    
        "statistics": [
              {
                "type": "Ball Possession",
                "home": "70%",
                "away": "30%"
              },
              {
                "type": "Goal Attempts",
                "home": "6",
                "away": "3"
              },
              {
                "type": "Shots on Goal",
                "home": "4",
                "away": "1"
              },
              {
                "type": "Shots off Goal",
                "home": "1",
                "away": "2"
              },
              {
                "type": "Blocked Shots",
                "home": "1",
                "away": "0"
              },
              {
                "type": "Free Kicks",
                "home": "10",
                "away": "12"
              },
              {
                "type": "Corner Kicks",
                "home": "5",
                "away": "2"
              },
              {
                "type": "Offsides",
                "home": "2",
                "away": "1"
              },
              {
                "type": "Goalkeeper Saves",
                "home": "1",
                "away": "2"
              },
              {
                "type": "Fouls",
                "home": "11",
                "away": "9"
              },
              {
                "type": "Yellow Cards",
                "home": "2",
                "away": "0"
              },
              {
                "type": "Total Passes",
                "home": "657",
                "away": "272"
              },
              {
                "type": "Tackles",
                "home": "11",
                "away": "18"
              }
            ]
          },
          .....
        ]

Here是我得到的json文件的一小段示例代码。我想通过提取“统计”列中的值来使其变平。

我尝试了

flat_matches = pd.concat([all_matches.drop(['statistics'],axis=1),all_matches['statistics'].apply(pd.Series)], axis=1)

它以某种方式工作,但不如我所希望的那样。我想用列创建新的df;

  1. 索引
  2. match_hometeam_score
  3. match_awayteam_score
  4. GoalAttempts_home
  5. GoalAttempts_away
  6. Shots_on_Goal_home
  7. Shots_on_Goal_away
  8. DangerousAttacks_home
  9. DangerousAttacks_away

CSV代码如下;

,match_hometeam_score,match_awayteam_score,statistics
0,3,1,"[{'type': 'Ball Possession', 'home': '44%', 'away': '56%'}, {'type': 'Goal Attempts', 'home': '15', 'away': '6'}, {'type': 'Shots on Goal', 'home': '5', 'away': '5'}, {'type': 'Shots off Goal', 'home': '9', 'away': '1'}, {'type': 'Blocked Shots', 'home': '1', 'away': '0'}, {'type': 'Corner Kicks', 'home': '3', 'away': '3'}, {'type': 'Offsides', 'home': '4', 'away': '2'}, {'type': 'Goalkeeper Saves', 'home': '4', 'away': '2'}, {'type': 'Fouls', 'home': '11', 'away': '10'}, {'type': 'Yellow Cards', 'home': '2', 'away': '4'}, {'type': 'Total Passes', 'home': '382', 'away': '503'}, {'type': 'Tackles', 'home': '13', 'away': '16'}, {'type': 'Attacks', 'home': '97', 'away': '136'}, {'type': 'Dangerous Attacks', 'home': '45', 'away': '63'}]"
1,1,2,"[{'type': 'Ball Possession', 'home': '61%', 'away': '39%'}, {'type': 'Goal Attempts', 'home': '22', 'away': '12'}, {'type': 'Shots on Goal', 'home': '10', 'away': '7'}, {'type': 'Shots off Goal', 'home': '6', 'away': '3'}, {'type': 'Blocked Shots', 'home': '6', 'away': '2'}, {'type': 'Corner Kicks', 'home': '7', 'away': '2'}, {'type': 'Offsides', 'home': '0', 'away': '2'}, {'type': 'Goalkeeper Saves', 'home': '5', 'away': '9'}, {'type': 'Fouls', 'home': '12', 'away': '13'}, {'type': 'Yellow Cards', 'home': '4', 'away': '4'}, {'type': 'Total Passes', 'home': '421', 'away': '271'}, {'type': 'Tackles', 'home': '14', 'away': '24'}, {'type': 'Attacks', 'home': '97', 'away': '86'}, {'type': 'Dangerous Attacks', 'home': '43', 'away': '46'}]"
2,1,2,"[{'type': 'Ball Possession', 'home': '48%', 'away': '52%'}, {'type': 'Goal Attempts', 'home': '16', 'away': '14'}, {'type': 'Shots on Goal', 'home': '4', 'away': '6'}, {'type': 'Shots off Goal', 'home': '6', 'away': '5'}, {'type': 'Blocked Shots', 'home': '6', 'away': '3'}, {'type': 'Corner Kicks', 'home': '4', 'away': '4'}, {'type': 'Offsides', 'home': '2', 'away': '6'}, {'type': 'Goalkeeper Saves', 'home': '4', 'away': '3'}, {'type': 'Fouls', 'home': '11', 'away': '14'}, {'type': 'Yellow Cards', 'home': '2', 'away': '7'}, {'type': 'Total Passes', 'home': '594', 'away': '643'}, {'type': 'Tackles', 'home': '24', 'away': '16'}, {'type': 'Attacks', 'home': '144', 'away': '130'}, {'type': 'Dangerous Attacks', 'home': '77', 'away': '36'}]"

非常感谢您的各种帮助!请告诉我如何将这个json数据集展平到同一级别。我是新手爱好者。如果我可以改善问题的质量,请随时给我提示。

RESULT I WOULD LIKE IS THIS;

1 个答案:

答案 0 :(得分:0)

下面显示了如何转换数据框中的给定行。您需要遍历并创建如下所示的数据框。

import json
import pandas as pd

sample_row = [{'type': 'Ball Possession', 'home': '44%', 'away': '56%'}, {'type': 'Goal Attempts', 'home': '15', 'away': '6'}, {'type': 'Shots on Goal', 'home': '5', 'away': '5'}, {'type': 'Shots off Goal', 'home': '9', 'away': '1'}, {'type': 'Blocked Shots', 'home': '1', 'away': '0'}, {'type': 'Corner Kicks', 'home': '3', 'away': '3'}, {'type': 'Offsides', 'home': '4', 'away': '2'}, {'type': 'Goalkeeper Saves', 'home': '4', 'away': '2'}, {'type': 'Fouls', 'home': '11', 'away': '10'}, {'type': 'Yellow Cards', 'home': '2', 'away': '4'}, {'type': 'Total Passes', 'home': '382', 'away': '503'}, {'type': 'Tackles', 'home': '13', 'away': '16'}, {'type': 'Attacks', 'home': '97', 'away': '136'}, {'type': 'Dangerous Attacks', 'home': '45', 'away': '63'}]

js = json.dumps(sample_row)
df = pd.json_normalize(json.loads(js))

df['match_hometeam_score'] = [3] * len(df)
df['match_awayteam_score'] = [1] * len(df)

enter image description here