我有这个json数据集。从此数据集中,我只需要“ column_names”键及其值以及“ data”键及其值。column_names的每个值都对应于数据值。我如何仅将这两个键结合在python中进行分析
{"dataset":{"id":42635350,"dataset_code":"MSFT","column_names":
["Date","Open","High","Low","Close","Volume","Dividend","Split",
"Adj_Open","Adj_High","Adj_Low","Adj_Close","Adj_Volume"],
"frequency":"daily","type":"Time Series",
"data":[["2017-12-28",85.9,85.93,85.55,85.72,10594344.0,0.0,1.0,83.1976157998082,
83.22667201021558,82.85862667838872,83.0232785373639,10594344.0],
["2017-12-27",85.65,85.98,85.215,85.71,14678025.0,0.0,1.0,82.95548071308001,
83.27509902756123,82.53416566217294,83.01359313389476,14678025.0]
for cnames in data['dataset']['column_names']:
print(cnames)
for cdata in data['dataset']['data']:
print(cdata)
For循环为我提供了我想要的列名和数据值,但是我不确定如何将其组合起来并使其成为用于分析的python数据框。
参考:以上代码来自quandal网站
答案 0 :(得分:1)
data = {
"dataset": {
"id":42635350,"dataset_code":"MSFT",
"column_names": ["Date","Open","High","Low","Close","Volume","Dividend","Split","Adj_Open","Adj_High","Adj_Low","Adj_Close","Adj_Volume"],
"frequency":"daily",
"type":"Time Series",
"data":[
["2017-12-28",85.9,85.93,85.55,85.72,10594344.0,0.0,1.0,83.1976157998082, 83.22667201021558,82.85862667838872,83.0232785373639,10594344.0],
["2017-12-27",85.65,85.98,85.215,85.71,14678025.0,0.0,1.0,82.95548071308001,83.27509902756123,82.53416566217294,83.01359313389476,14678025.0]
]
}
}
下面的代码应该做你想要的吗?
import pandas as pd
df = pd.DataFrame(data, columns = data['dataset']['column_names'])
for i, data_row in enumerate(data['dataset']['data']):
df.loc[i] = data_row
答案 1 :(得分:0)
以下代码段应该对您有用
import pandas as pd
df = pd.DataFrame(data['dataset']['data'],columns=data['dataset']['column_names'])
检查以下链接以了解更多信息 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
答案 2 :(得分:0)
cols = data['dataset']['column_names']
data = data['dataset']['data']
这很简单
labeled_data = [dict(zip(cols, d)) for d in data]