以前,我从CSV文件中读取并获取CSV文件中数据的最小值,最大值和平均值。我试图从JSON文件中读取相同的数据,并将输出写入CSV,但我不理解如何操作。任何帮助是极大的赞赏。我的JSON文件如下:
{
"数据":[
{
"时间":" 2015-10-14 15:01:10",
"价值观":{
" d1":3956.58,
" d2":0,
" d3":19,
" d4":6.21,
" d4":105.99,
" d5":42,
" d6":59.24
}
},
{
"时间":" 2015-10-14 15:01:20",
"价值观":{
" d1":3956.58,
" d2":0,
" d3":1,
" d4":0.81,
" d5":121.57,
" d6":42,
" d7":59.24
} .. ..
我到目前为止的代码是:
df = pd.read_json('data.json', convert_dates = True) df['time'] = [pd.to_datetime(d) for d in df['time']] df = df.set_index('time') hourly_stats = d.groupby(pd.TimeGrouper('H')) print((hourly_stats).agg([np.mean, np.min, np.max])) ((hourly_stats).agg([np.mean, np.min, np.max])).to_csv('file.csv')
答案 0 :(得分:2)
首先,您的JSON不正确。纠正它,并在使用前Validate。之后,你可以做这样的事情来获取python中的数据:
document.querySelector(".results")
.setAttribute("title", document.title.match(/^\w+/))
答案 1 :(得分:2)
我稍微修改了你的JSON字符串,并添加了另外一条记录,以便有不同的'小时'组。
import pandas as pd
import numpy as np
import json
jsondata = '''{
"data": [
{
"time": "2015-10-14 15:01:10",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 19,
"d4": 6.21,
"d5": 105.99,
"d6": 42,
"d7": 59.24
}
},
{
"time": "2015-10-14 15:01:20",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
}
},
{
"time": "2015-10-14 16:01:20",
"values": {
"d1": 31956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
}
}
]
}
'''
data = json.loads(jsondata)['data']
#If your JSON data is in a file, then do:
#data = json.load(jsonfile)['data']
df = pd.DataFrame(data=[record['values'] for record in data],
index=pd.DatetimeIndex([record['time'] for record in data], name='time'))
print df
print df.groupby(pd.Grouper(freq='H')).agg([np.mean, max, min])
输出(df
):
d1 d2 d3 d4 d5 d6 d7
time
2015-10-14 15:01:10 3956.58 0 19 6.21 105.99 42 59.24
2015-10-14 15:01:20 3956.58 0 1 0.81 121.57 42 59.24
2015-10-14 16:01:20 31956.58 0 1 0.81 121.57 42 59.24
输出统计:
d1 d2 d3 \
mean max min mean max min mean max min
time
2015-10-14 15:00:00 3956.58 3956.58 3956.58 0 0 0 10 19 1
2015-10-14 16:00:00 31956.58 31956.58 31956.58 0 0 0 1 1 1
d4 ... d5 d6 \
mean ... min mean max min mean max min
time ...
2015-10-14 15:00:00 3.51 ... 0.81 113.78 121.57 105.99 42 42 42
2015-10-14 16:00:00 0.81 ... 0.81 121.57 121.57 121.57 42 42 42
d7
mean max min
time
2015-10-14 15:00:00 59.24 59.24 59.24
2015-10-14 16:00:00 59.24 59.24 59.24
[2 rows x 21 columns]
直接使用pd.read_json
似乎不起作用,因为结果数据框具有难以使用的意外结构。
答案 2 :(得分:0)
正如您所看到的,“数据”实际上是一个数组,请查看其后的空心括号。 所以你想要成为阵列的第一个成员,然后是时间。由于它被截断,我将假设数组的所有成员都是相同的。 所以要访问你需要像data [0] ['time']
这样的东西答案 3 :(得分:0)
嗯,您的实际代码和您尝试做的事情的描述看起来有点不同。希望这会有所帮助,您需要做的就是重新定义标题并将您的业务逻辑粘贴在" json_to_dict"功能,你应该很高兴去。
import json
import csv
def to_csv(json_obj, fname='my_csv.csv'):
with open(fname, 'w') as f:
to_write = json_to_writable_dict(json_obj)
fieldnames = ['time'] + ['d{}'.format(i) for i in range(1, 8)]
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for row in to_write:
writer.writerow(row)
return fname
def json_to_writable_dict(json_obj):
data, values, time = 'data', 'values', 'time'
json_dict = dict(json_obj)
to_write = []
for item in json_dict[data]:
row = {'d{}'.format(i): item[values]['d{}'.format(i)] for i in range(1, 8)}
row.update({'time': item[time]})
to_write.append(row)
return to_write
def main():
s = '''{
"data": [
{
"time": "2015-10-14 15:01:10",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 19,
"d4": 6.21,
"d5": 105.99,
"d6": 42,
"d7": 59.24
}
},
{
"time": "2015-10-14 15:01:20",
"values": {
"d1": 3956.58,
"d2": 0,
"d3": 1,
"d4": 0.81,
"d5": 121.57,
"d6": 42,
"d7": 59.24
}
}
]
}'''
json_thing = json.loads(s)
csv_name = to_csv(json_obj=json_thing)
with open(csv_name) as f:
print(f.read())
if __name__ == '__main__':
main()