我具有以下数据框,并想要创建一个图表,其标题为日期,在x轴上带有时间,在y上带有µmoles:
0 2019-06-11 17:21:35 13.5
1 2019-06-11 17:22:35 13.1
2 2019-06-11 17:23:35 13.0
3 2019-06-11 17:24:35 11.8
4 2019-06-11 17:25:35 11.8
... ... ... ...
394 2019-06-11 23:55:38 0.0
395 2019-06-11 23:56:38 0.0
396 2019-06-11 23:57:38 0.0
397 2019-06-11 23:58:38 0.0
398 2019-06-11 23:59:38 0.0
我已经写出了一些数据块,这些数据帧将时隙分开,并计算了下午5点,下午6点等的平均测量值。例如:
seventeen = df.iloc[:39] # seventeen (for 5pm)
seventeen["\u03bcmoles"].mean()
six_pm = df.iloc[39:99] # six_pm (for 6pm)
six_pm["\u03bcmoles"].mean()
以此类推。
我想绘制一个图形,该图形将使用这些度量和此类代码:
df.plot(x ='Timestamp', y='\u03bcmoles', kind = 'line')
datapoints = seventeen, six_pm, seven, twenty_hundred, twenty_one, twenty_two, twenty_three (so these are all the datapoints for which I calculate the averages)
plt.show()
有没有办法实现这一目标?
答案 0 :(得分:0)
请考虑按小时与 pandas.Grouper
进行汇总,而不是单独的每小时平均值。
fig, ax = plt.subplots(figsize=(12,6))
df.plot(x ='Timestamp', y='\u03bcmoles', kind = 'line', ax=ax)
agg = (df.groupby(pd.Grouper(key='Timestamp', freq='h'))['\u03bcmoles'].mean()
.reset_index()
.set_axis(['Timestamp', 'mean_\u03bcmoles'], axis='columns', inplace=False)
)
agg.plot(x='Timestamp', y='mean_\u03bcmoles', xticks=agg['Timestamp'].tolist(),
kind='line', marker='o', color='green', ax=ax)
plt.show()
是否需要特定的小时数,请使用.loc
对汇总数据使用.isin
按小时进行过滤:
(agg.loc[agg['Timestamp'].dt.hour.isin([17, 18, 20, 21, 22, 23])]
.plot(x='Timestamp', y='mean_\u03bcmoles',
kind='line', marker='o', color='green', ax=ax)
)
使用随机数据进行演示:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
### DATA BUILD
np.random.seed(10262020)
df = pd.DataFrame({'Timestamp': pd.to_datetime(1603670400 + np.random.randint(1, 86400, 500), unit='s'),
'\u03bcmoles': np.random.uniform(50, 100, 500)
}).sort_values('Timestamp')
### AGGREGATION BUILD
agg = (df.groupby(pd.Grouper(key='Timestamp', freq='h'))['\u03bcmoles'].mean()
.reset_index()
.set_axis(['Timestamp', 'mean_\u03bcmoles'], axis='columns', inplace=False)
)
### PLOT BUILD
fig, ax = plt.subplots(figsize=(12,6))
df.plot(x ='Timestamp', y='\u03bcmoles', kind = 'line', ax=ax)
agg.plot(x='Timestamp', y='mean_\u03bcmoles',
xticks=agg['Timestamp'].tolist() + [agg['Timestamp'].dt.ceil(freq='d').max()],
kind='line', marker='o', color='green', ax=ax)
ax.xaxis.set_major_formatter(DateFormatter("%Y-%m-%d %H:%M:%S"))
plt.show()