熊猫-从数据帧中转换json格式字符串

时间:2019-11-18 13:21:45

标签: python json pandas dataframe

这个问题可能很复杂:-),而且我几个小时都找不到答案。...

我有一个数据框内某一列的json类型数据...

population  postcode    salesGrowthList
0   3507    2250    [{'medianSoldPrice': 300000.0, 'annualGrowth':...
1   3507    2250    [{'medianSoldPrice': 353000.0, 'annualGrowth':...
2   3507    2250    [{'medianSoldPrice': 0.0, 'annualGrowth': 0.0,...
3   3507    2250    [{'medianSoldPrice': 0.0, 'annualGrowth': 0.0,...

'salesGrowthList'中的内容如下所示...它是一种字符串格式,但它是Json结构化的字符串。

"[{'medianSoldPrice': 300000.0, 'annualGrowth': 0.0, 'numberSold': 19, 'year': 2014}, {'medianSoldPrice': 347000.0, 'annualGrowth': 0.15666666666666668, 'numberSold': 27, 'year': 2015}, {'medianSoldPrice': 371000.0, 'annualGrowth': 0.069164265129683, 'numberSold': 12, 'year': 2016}, {'medianSoldPrice': 410000.0, 'annualGrowth': 0.10512129380053908, 'numberSold': 15, 'year': 2017}, {'medianSoldPrice': 0.0, 'annualGrowth': 0.0, 'numberSold': 6, 'year': 2018}, {'medianSoldPrice': 411000.0, 'annualGrowth': 0.0, 'numberSold': 10, 'year': 2019}]"

现在我想从此输出中构建一个新的数据框,该怎么做?

1 个答案:

答案 0 :(得分:0)

您可以使用json加载json字符串,然后将其提供给pandas.DataFrame

>>> import json
>>> import pandas as pd
>>> x
"[{'medianSoldPrice': 300000.0, 'annualGrowth': 0.0, 'numberSold': 19, 'year': 2014}, {'medianSoldPrice': 347000.0, 'annualGrowth': 0.15666666666666668, 'numberSold': 27, 'year': 2015}, {'medianSoldPrice': 371000.0, 'annualGrowth': 0.069164265129683, 'numberSold': 12, 'year': 2016}, {'medianSoldPrice': 410000.0, 'annualGrowth': 0.10512129380053908, 'numberSold': 15, 'year': 2017}, {'medianSoldPrice': 0.0, 'annualGrowth': 0.0, 'numberSold': 6, 'year': 2018}, {'medianSoldPrice': 411000.0, 'annualGrowth': 0.0, 'numberSold': 10, 'year': 2019}]"
>>> d = json.loads(x.replace("'", '"'))
>>> df = pd.DataFrame(d)
>>> df
   medianSoldPrice  annualGrowth  numberSold  year
0         300000.0      0.000000          19  2014
1         347000.0      0.156667          27  2015
2         371000.0      0.069164          12  2016
3         410000.0      0.105121          15  2017
4              0.0      0.000000           6  2018
5         411000.0      0.000000          10  2019
>>> 

然后将列添加到原始数据框中,例如,

>>> orig_df['salesGrowthList'] = df
>>>