我无法按小时对样本进行分组。数据结构如下:
data = [
{
"pressure": "1009.7",
"timestamp": "2019-09-03 08:03:00"
},
{
"pressure": "1009.7",
"timestamp": "2019-09-03 08:18:00"
},
{
"pressure": "1009.8",
"timestamp": "2019-09-03 08:33:00"
},
{
"pressure": "1009.8",
"timestamp": "2019-09-03 08:56:00"
},
{
"pressure": "1009.8",
"timestamp": "2019-09-03 09:03:00"
},
{
"pressure": "1009.8",
"timestamp": "2019-09-03 09:18:00"
},
{
"pressure": "1009.8",
"timestamp": "2019-09-03 09:33:00"
},
{
"pressure": "1009.7",
"timestamp": "2019-09-03 09:56:00"
},
{
"pressure": "1009.6",
"timestamp": "2019-09-03 10:03:00"
}
]
您可以看到,每小时进行4次压力测量,我想计算每小时的平均值。我曾经尝试过用Pandas实现这一目标,但是没有运气。我尝试过的是提取开始和结束时间戳记,将它们舍入整整一小时,然后将其传递给DataFrame作为索引,并将json作为数据传递,但是形状不匹配(难怪)。我以为我可以将其传递给df,然后再尝试计算均值,但是看来我应该进行一些中间步骤。
答案 0 :(得分:3)
如果您的JSON模仿上述内容,那么我们可以将其传递到数据框
df = pd.DataFrame.from_dict(data)
pressure timestamp
0 1009.7 2019-09-03 08:03:00
1 1009.7 2019-09-03 08:18:00
2 1009.8 2019-09-03 08:33:00
3 1009.8 2019-09-03 08:56:00
4 1009.8 2019-09-03 09:03:00
5 1009.8 2019-09-03 09:18:00
6 1009.8 2019-09-03 09:33:00
7 1009.7 2019-09-03 09:56:00
8 1009.6 2019-09-03 10:03:00
然后按小时分组,并取平均压力。
hourly_avg = df.groupby(df['timestamp'].dt.hour)['pressure'].mean()
print(hourly_avg)
timestamp
8 1009.750
9 1009.775
10 1009.600
Name: pressure, dtype: float64
请注意,您需要将日期设置为适当的DateTime并将其压成浮点值。
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['pressure'] = df['pressure'].astype(float)
答案 1 :(得分:1)
我将通过创建一个新的字典来解决这个问题,该字典以日期/小时为键,压力为列表(字典的值)。
d = {}
for _dict in data:
key = _dict['timestamp'][:13] # 2019-09-03 08, etc.
d.setdefault(key, []).append(float(_dict['pressure']))
for key, array in d.items():
print(key, format(sum(array) / len(array), '.3f'))
打印:
2019-09-03 08 1009.750
2019-09-03 09 1009.775
2019-09-03 10 1009.600
答案 2 :(得分:1)
检查:
df = pd.DataFrame(data)
df['timestamp']=pd.to_datetime(df['timestamp'], format='%Y%m%d %H:%M:%S')
df['pressure'] = df['pressure'].astype(float)
df['hour'] = df['timestamp'].dt.hour
pressure = df.groupby([df['hour']])['pressure'].mean()
print(pressure)
输出:
timestamp
8 1009.750
9 1009.775
10 1009.600