嵌套JSON列表/字典的解析问题

时间:2019-04-15 14:51:55

标签: python json

我正在编写一个脚本,用于基于已解析的JSON创建csv。我能够读取JSON,但是却陷入TypeError:列表索引必须是整数,而不是dict。

我有一个可以使用的先前版本,但是使用了不同的JSON,因此结构略有不同,并且我正在从事我的逆向工程技能,但是在提取时略有不同却很困惑。

JSON的结构如下:

{
 "league": {
   "games": {
     "0": [
      {
       "game": {
          "game_number": "game_1",
          "season": "2019",
          "start_time": "Sat, 13 Apr 2019 23:00:00", 
          "team_id": [
              {
              "away_team": "team_x"
              },
              {
              "home_team": "team_a"
              },
           ],
          },
         },
        ],
       },
      },
}
data = json_parsed['league']['games'][0]

with open('./soccer_041519.csv', 'w+') as csvFile:

    for game in data:
        gameid = data[game]['game_number']
        start_time = data[game]['start_time']
        home_team_id = data[game]['home_team']
        away_team_id = data[game]['away_team']
csvFile.write("%s @ %s,%s, ,%s\n"%(away_team_id, home_team_id, gameid, start_time))     


The values should be written to the CSV

2 个答案:

答案 0 :(得分:1)

我注意到json数据的一些事情:

  1. 缺少大量的逗号

  2. 您正在尝试通过索引值为0的元素来调用该元素 一种     "0"

  3. 的键

因此修正1:

{
 "league": {
   "games": {
     "0": [
      {
       "game": {
          "game_number": "game_1"   <---- need comma
          "season": "2019"    <-----need comma
          "start_time": "Sat, 13 Apr 2019 23:00:00"  <-----need comma
          "team_id": [
              {
              "away_team": "team_x"
              }   <-----need comma
              {
              "home_team": "team_a"
              }
           ]
          }
         }
        ]
       }
      }
}

因此,请解决此问题:

json_parsed = {"league": {
   "games": {
     "0": [
      {
       "game": {
          "game_number": "game_1",
          "season": "2019",
          "start_time": "Sat, 13 Apr 2019 23:00:00" ,
          "team_id": [
              {
              "away_team": "team_x"
              },
              {
              "home_team": "team_a"
              }
           ]
          }
         }
        ]
       }
      }
}

修复2:

data = json_parsed['league']['games']['0']

然后进行循环:

for game in data:
    gameid = game['game_number']
    start_time = game['start_time']
    home_team_id = game['home_team_id']
    away_team_id = game['away_team_id']         

答案 1 :(得分:0)

for game in data:
    gameid = data[game]['game_number']

您不需要引用data,因为您的循环已经获取了每个特定的游戏。

改为使用gameid = game['game']['game_number']

此外,您对team_id列表的使用令人困惑。为什么是清单?通常,当您不知道会有多少个项目时会使用列表,但是在这种情况下,您似乎知道,总会有一支主队和一支客队,对吧?

似乎用一个字典来代表team_id是更好的方法:

"team_id": {
    "away_team": "team_x"
    "home_team": "team_a"
}

或者更好的是,为什么这个子词典根本存在?似乎away_team_idhome_team_id可能是主级字典中的元素。为什么需要将它们自己隔离开?