Python pandas json_normalize

时间:2016-02-26 09:40:11

标签: python json python-2.7 python-3.x pandas

我有以下输出json,我尝试使用json_normalize将其转换为带有pandas的数据帧。我可以使用json_normalize(数据,['跑步者'])进入跑步者水平,但我可以达到前级水平。

[{
    u 'status' : u 'OPEN',
    u 'isMarketDataDelayed' : False,
    u 'numberOfRunners' : 9,
    u 'complete' : True,
    u 'bspReconciled' : False,
    u 'runnersVoidable' : False,
    u 'betDelay' : 0,
    u 'marketId' : u '1.123264244',
    u 'crossMatching' : False,
    u 'totalMatched' : 4.22,
    u 'version' : 1241856317,
    u 'lastMatchTime' : u '2016-02-25T10:32:25.704Z',
    u 'numberOfWinners' : 1,
    u 'inplay' : False,
    u 'numberOfActiveRunners' : 9,
    u 'totalAvailable' : 39.26,
    u 'runners' : [{
            u 'status' : u 'ACTIVE',
            u 'handicap' : 0.0,
            u 'selectionId' : 10861647,
            u 'totalMatched' : 0.0,
            u 'adjustmentFactor' : 16.631,
            u 'ex' : {
                u 'availableToBack' : [{
                        u 'price' : 1.02,
                        u 'size' : 2.15
                    }
                ],
                u 'availableToLay' : [],
                u 'tradedVolume' : []
            }
        }, {
            u 'status' : u 'ACTIVE',
            u 'handicap' : 0.0,
            u 'selectionId' : 10861648,
            u 'totalMatched' : 0.0,
            u 'adjustmentFactor' : 13.237,
            u 'ex' : {
                u 'availableToBack' : [{
                        u 'price' : 1.01,
                        u 'size' : 7.11
                    }
                ],
                u 'availableToLay' : [],
                u 'tradedVolume' : []
            }
        },

使用其他数据我可以轻松地使用json_normalize(数据,[&#39;跑步者&#39;前&#39;]),但如果我在这种情况下这样做,我会得到< / p>

                   0
0    availableToBack
1     availableToLay
2       tradedVolume
3    availableToBack
4     availableToLay
5       tradedVolume
6    availableToBack
7     availableToLay
8       tradedVolume
9    availableToBack
10    availableToLay
11      tradedVolume
12   availableToBack
13    availableToLay
14      tradedVolume
15   availableToBack
16    availableToLay
17      tradedVolume
18   availableToBack
19    availableToLay
20      tradedVolume
21   availableToBack
22    availableToLay
23      tradedVolume
24   availableToBack
25    availableToLay
26      tradedVolume
27   availableToBack
28    availableToLay
29      tradedVolume
..               ...

你能帮我解决一下这个问题吗?

1 个答案:

答案 0 :(得分:0)

  • data 是嵌套的 listdicts
  • lists
  • 'ex.availableToBack'listdicts,它被标准化为列 'price''size'

标准化所有data

import pandas as pd

# if you want all of data, load data into a dataframe
df = pd.json_normalize(data)

# runners is a list that needs to be exploded
df = df.explode('runners').reset_index(drop=True)

# runners is a column of dicts that need to be normalized
runners = pd.json_normalize(df.pop('runners'))

# there are a number of columns that are lists that must be exploded
runners = runners.apply(pd.Series.explode)

# flatten ex.availableToBack
runners = runners.join(pd.DataFrame(runners.pop('ex.availableToBack').values.tolist()))

# add a prefix to all the runners column names
runners.columns = [f'runners_{v}' for v in runners.columns]

# join df and runners
df = df.join(runners)

# extract the ex columns
ex_cols = df.iloc[:, -4:].copy()

# display(df)
   betDelay  bspReconciled  complete  crossMatching  inplay  isMarketDataDelayed             lastMatchTime     marketId  numberOfActiveRunners  numberOfRunners  numberOfWinners  runnersVoidable status  totalAvailable  totalMatched     version  runners_adjustmentFactor  runners_handicap  runners_selectionId runners_status  runners_totalMatched runners_ex.availableToLay runners_ex.tradedVolume  runners_price  runners_size
0         0          False      True          False   False                False  2016-02-25T10:32:25.704Z  1.123264244                      9                9                1            False   OPEN           39.26          4.22  1241856317                    16.631               0.0             10861647         ACTIVE                   0.0                       NaN                     NaN           1.02          2.15
1         0          False      True          False   False                False  2016-02-25T10:32:25.704Z  1.123264244                      9                9                1            False   OPEN           39.26          4.22  1241856317                    13.237               0.0             10861648         ACTIVE                   0.0                       NaN                     NaN           1.01          7.11

# display(ex_cols)
  runners_ex.availableToLay runners_ex.tradedVolume  runners_price  runners_size
0                       NaN                     NaN           1.02          2.15
1                       NaN                     NaN           1.01          7.11

仅对 keys 中的 'runners' 进行标准化

# normalize the runners key to get ex
runners = pd.json_normalize(data, record_path=['runners'])

# there are a number of columns that are lists that must be exploded
runners = runners.apply(pd.Series.explode).reset_index(drop=True)

# flatten ex.availableToBack
runners = runners.join(pd.DataFrame(runners.pop('ex.availableToBack').values.tolist()))

# extract the ex columns
ex_cols = runners.iloc[:, -4:].copy()

# display(runners)
   adjustmentFactor  handicap  selectionId  status  totalMatched ex.availableToLay ex.tradedVolume  price  size
0            16.631       0.0     10861647  ACTIVE           0.0               NaN             NaN   1.02  2.15
1            13.237       0.0     10861648  ACTIVE           0.0               NaN             NaN   1.01  7.11

# display(ex_cols)
  ex.availableToLay ex.tradedVolume  price  size
0               NaN             NaN   1.02  2.15
1               NaN             NaN   1.01  7.11

data

data =\
[{'betDelay': 0,
  'bspReconciled': False,
  'complete': True,
  'crossMatching': False,
  'inplay': False,
  'isMarketDataDelayed': False,
  'lastMatchTime': '2016-02-25T10:32:25.704Z',
  'marketId': '1.123264244',
  'numberOfActiveRunners': 9,
  'numberOfRunners': 9,
  'numberOfWinners': 1,
  'runners': [{'adjustmentFactor': 16.631,
               'ex': {'availableToBack': [{'price': 1.02, 'size': 2.15}],
                      'availableToLay': [],
                      'tradedVolume': []},
               'handicap': 0.0,
               'selectionId': 10861647,
               'status': 'ACTIVE',
               'totalMatched': 0.0},
              {'adjustmentFactor': 13.237,
               'ex': {'availableToBack': [{'price': 1.01, 'size': 7.11}],
                      'availableToLay': [],
                      'tradedVolume': []},
               'handicap': 0.0,
               'selectionId': 10861648,
               'status': 'ACTIVE',
               'totalMatched': 0.0}],
  'runnersVoidable': False,
  'status': 'OPEN',
  'totalAvailable': 39.26,
  'totalMatched': 4.22,
  'version': 1241856317}]