这个问题可能很复杂:-),而且我几个小时都找不到答案。...
我有一个数据框内某一列的json类型数据...
population postcode salesGrowthList
0 3507 2250 [{'medianSoldPrice': 300000.0, 'annualGrowth':...
1 3507 2250 [{'medianSoldPrice': 353000.0, 'annualGrowth':...
2 3507 2250 [{'medianSoldPrice': 0.0, 'annualGrowth': 0.0,...
3 3507 2250 [{'medianSoldPrice': 0.0, 'annualGrowth': 0.0,...
'salesGrowthList'中的内容如下所示...它是一种字符串格式,但它是Json结构化的字符串。
"[{'medianSoldPrice': 300000.0, 'annualGrowth': 0.0, 'numberSold': 19, 'year': 2014}, {'medianSoldPrice': 347000.0, 'annualGrowth': 0.15666666666666668, 'numberSold': 27, 'year': 2015}, {'medianSoldPrice': 371000.0, 'annualGrowth': 0.069164265129683, 'numberSold': 12, 'year': 2016}, {'medianSoldPrice': 410000.0, 'annualGrowth': 0.10512129380053908, 'numberSold': 15, 'year': 2017}, {'medianSoldPrice': 0.0, 'annualGrowth': 0.0, 'numberSold': 6, 'year': 2018}, {'medianSoldPrice': 411000.0, 'annualGrowth': 0.0, 'numberSold': 10, 'year': 2019}]"
现在我想从此输出中构建一个新的数据框,该怎么做?
答案 0 :(得分:0)
您可以使用json
加载json字符串,然后将其提供给pandas.DataFrame
,
>>> import json
>>> import pandas as pd
>>> x
"[{'medianSoldPrice': 300000.0, 'annualGrowth': 0.0, 'numberSold': 19, 'year': 2014}, {'medianSoldPrice': 347000.0, 'annualGrowth': 0.15666666666666668, 'numberSold': 27, 'year': 2015}, {'medianSoldPrice': 371000.0, 'annualGrowth': 0.069164265129683, 'numberSold': 12, 'year': 2016}, {'medianSoldPrice': 410000.0, 'annualGrowth': 0.10512129380053908, 'numberSold': 15, 'year': 2017}, {'medianSoldPrice': 0.0, 'annualGrowth': 0.0, 'numberSold': 6, 'year': 2018}, {'medianSoldPrice': 411000.0, 'annualGrowth': 0.0, 'numberSold': 10, 'year': 2019}]"
>>> d = json.loads(x.replace("'", '"'))
>>> df = pd.DataFrame(d)
>>> df
medianSoldPrice annualGrowth numberSold year
0 300000.0 0.000000 19 2014
1 347000.0 0.156667 27 2015
2 371000.0 0.069164 12 2016
3 410000.0 0.105121 15 2017
4 0.0 0.000000 6 2018
5 411000.0 0.000000 10 2019
>>>
然后将列添加到原始数据框中,例如,
>>> orig_df['salesGrowthList'] = df
>>>