用平均值绘制图

时间:2020-10-26 17:30:41

标签: python pandas matplotlib

我具有以下数据框,并想要创建一个图表,其标题为日期,在x轴上带有时间,在y上带有µmoles:

0   2019-06-11  17:21:35    13.5
1   2019-06-11  17:22:35    13.1
2   2019-06-11  17:23:35    13.0
3   2019-06-11  17:24:35    11.8
4   2019-06-11  17:25:35    11.8
... ... ... ...
394 2019-06-11  23:55:38    0.0
395 2019-06-11  23:56:38    0.0
396 2019-06-11  23:57:38    0.0
397 2019-06-11  23:58:38    0.0
398 2019-06-11  23:59:38    0.0

我已经写出了一些数据块,这些数据帧将时隙分开,并计算了下午5点,下午6点等的平均测量值。例如:

seventeen = df.iloc[:39]  # seventeen (for 5pm)
seventeen["\u03bcmoles"].mean()

six_pm = df.iloc[39:99]   # six_pm (for 6pm)
six_pm["\u03bcmoles"].mean()

以此类推。

我想绘制一个图形,该图形将使用这些度量和此类代码:

df.plot(x ='Timestamp', y='\u03bcmoles', kind = 'line')
datapoints = seventeen, six_pm, seven, twenty_hundred, twenty_one, twenty_two, twenty_three (so these are all the datapoints for which I calculate the averages)
plt.show()

有没有办法实现这一目标?

1 个答案:

答案 0 :(得分:0)

请考虑按小时与 pandas.Grouper 进行汇总,而不是单独的每小时平均值。

fig, ax = plt.subplots(figsize=(12,6))
df.plot(x ='Timestamp', y='\u03bcmoles', kind = 'line', ax=ax)

agg = (df.groupby(pd.Grouper(key='Timestamp', freq='h'))['\u03bcmoles'].mean()
         .reset_index()
         .set_axis(['Timestamp', 'mean_\u03bcmoles'], axis='columns', inplace=False)
      )

agg.plot(x='Timestamp', y='mean_\u03bcmoles', xticks=agg['Timestamp'].tolist(),
         kind='line', marker='o', color='green', ax=ax)

plt.show()

是否需要特定的小时数,请使用.loc对汇总数据使用.isin按小时进行过滤:

(agg.loc[agg['Timestamp'].dt.hour.isin([17, 18, 20, 21, 22, 23])]
    .plot(x='Timestamp', y='mean_\u03bcmoles', 
          kind='line', marker='o', color='green', ax=ax)
)

使用随机数据进行演示:

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter

### DATA BUILD
np.random.seed(10262020)
df = pd.DataFrame({'Timestamp': pd.to_datetime(1603670400 + np.random.randint(1, 86400, 500), unit='s'),
                   '\u03bcmoles': np.random.uniform(50, 100, 500)
                  }).sort_values('Timestamp')

### AGGREGATION BUILD
agg = (df.groupby(pd.Grouper(key='Timestamp', freq='h'))['\u03bcmoles'].mean()
         .reset_index()
         .set_axis(['Timestamp', 'mean_\u03bcmoles'], axis='columns', inplace=False)
      )
      
### PLOT BUILD
fig, ax = plt.subplots(figsize=(12,6))
df.plot(x ='Timestamp', y='\u03bcmoles', kind = 'line', ax=ax)

agg.plot(x='Timestamp', y='mean_\u03bcmoles', 
         xticks=agg['Timestamp'].tolist() + [agg['Timestamp'].dt.ceil(freq='d').max()],
         kind='line', marker='o', color='green', ax=ax)

ax.xaxis.set_major_formatter(DateFormatter("%Y-%m-%d %H:%M:%S"))

plt.show()

enter image description here