如何计算熊猫每一列的每日平均值?

时间:2020-04-25 20:16:53

标签: python pandas dataframe data-science

从2001年到2018年,我有一个每小时读取某些污染物的数据框(df)。该df具有以下信息:

    date                    O_3     NO_2        SO_2        PM10        PM25        CO      
0   2001-01-01 01:00:00     7.86    67.120003   26.459999   32.349998   12.505127   0.45    
1   2001-01-01 02:00:00     7.21    70.620003   20.879999   40.709999   12.505127   0.48    
2   2001-01-01 03:00:00     7.11    72.629997   21.580000   50.209999   12.505127   0.41    
3   2001-01-01 04:00:00     7.14    75.029999   19.270000   54.880001   12.505127   0.51    
4   2001-01-01 05:00:00     8.46    66.589996   13.640000   42.340000   12.505127   0.19    
5   2018-04-30 20:00:00     63.00   58.000000   4.000000    2.000000    2.000000    0.30    
6   2018-04-30 21:00:00     49.00   65.000000   4.000000    5.000000    4.000000    0.30    
7   2018-04-30 22:00:00     49.00   58.000000   4.000000    5.000000    3.000000    0.30    
8   2018-04-30 23:00:00     48.00   52.000000   4.000000    7.000000    7.000000    0.30    
9   2018-05-01 00:00:00     52.00   43.000000   4.000000    6.000000    4.000000    0.30    

我想根据一天的小时数计算每列的平均值。换句话说,对于2001年1月1日计算小时01到05的平均值。上面的df只是一个小例子,实际df每天最多运行24小时,尽管有时几天每小时的污染物较少阅读。一旦我计算了每一列的平均值,我就会计算每一行以获得标签。

df具有以下规格:

Index(['date', 'O_3', 'NO_2', 'SO_2', 'PM10', 'PM25', 'CO', 'Label'], dtype='object')

关于NaN值:

date     0
O_3      0
NO_2     0
SO_2     0
PM10     0
PM25     0
CO       0
Label    0
dtype: int64

关于一般信息:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 139608 entries, 0 to 139607
Data columns (total 8 columns):
#   Column  Non-Null Count   Dtype         
---  ------  --------------   -----         
0   date    139608 non-null  datetime64[ns]
1   O_3     139608 non-null  float64       
2   NO_2    139608 non-null  float64       
3   SO_2    139608 non-null  float64       
4   PM10    139608 non-null  float64       
5   PM25    139608 non-null  float64       
6   CO      139608 non-null  float64       
7   Label   139608 non-null  float64       
dtypes: datetime64[ns](1), float64(7)

为了将日期分组,我尝试了以下操作:

day_df = hour_df.groupby([hour_df.date.dt.strftime('%Y-%m-%d')]).mean()

但是我不确定这是否是正确的方法。如果我查看df信息,则会得到:

<class 'pandas.core.frame.DataFrame'>
Index: 5824 entries, 2001-01-01 to 2018-05-01
Data columns (total 7 columns):
#   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
0   O_3     5824 non-null   float64
1   NO_2    5824 non-null   float64
2   SO_2    5824 non-null   float64
3   PM10    5824 non-null   float64
4   PM25    5824 non-null   float64
5   CO      5824 non-null   float64
6   Label   5824 non-null   float64
dtypes: float64(7)

可以看出,并非全天都有24小时的污染物读数,否则,将只有6329个条目,而不是仅5824个条目。这就是为什么我不确定是否正确计算平均值的原因。

我真的很想知道针对我所寻找的东西的正确方法。

1 个答案:

答案 0 :(得分:3)

Arduino: 1.8.12 (Windows 10), Board: "NodeMCU 1.0 (ESP-12E Module), 80 MHz, Flash, Legacy (new can return nullptr), All SSL ciphers (most compatible), 4MB (FS:2MB OTA:~1019KB), 2, v2 Lower Memory, Disabled, None, Only Sketch, 115200" In file included from C:\Users\mlfre\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\libraries\ESP8266WiFi\src/ESP8266WiFi.h:39:0, from C:\Users\mlfre\OneDrive\Desktop\packetHandler\packetHandler.ino:23: C:\Users\mlfre\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\libraries\ESP8266WiFi\src/WiFiClient.h:47:3: error: 'WiFiClient::WiFiClient(ClientContext*)' is protected WiFiClient(ClientContext* client); ^ packetHandler:181:41: error: within this context WiFiClient clients[CLIENT_COUNT] = {NULL}; ^ In file included from C:\Users\mlfre\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\libraries\ESP8266WiFi\src/ESP8266WiFi.h:39:0, from C:\Users\mlfre\OneDrive\Desktop\packetHandler\packetHandler.ino:23: C:\Users\mlfre\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\libraries\ESP8266WiFi\src/WiFiClient.h: In function 'void updateClients()': C:\Users\mlfre\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\libraries\ESP8266WiFi\src/WiFiClient.h:47:3: error: 'WiFiClient::WiFiClient(ClientContext*)' is protected WiFiClient(ClientContext* client); ^ packetHandler:322:20: error: within this context clients[i] = NULL; ^ exit status 1 within this context 列转换为熊猫date列。然后,对datetimeyear部分进行分组,忽略day部分以获得hour

mean