从2001年到2018年,我有一个每小时读取某些污染物的数据框(df)。该df具有以下信息:
date O_3 NO_2 SO_2 PM10 PM25 CO
0 2001-01-01 01:00:00 7.86 67.120003 26.459999 32.349998 12.505127 0.45
1 2001-01-01 02:00:00 7.21 70.620003 20.879999 40.709999 12.505127 0.48
2 2001-01-01 03:00:00 7.11 72.629997 21.580000 50.209999 12.505127 0.41
3 2001-01-01 04:00:00 7.14 75.029999 19.270000 54.880001 12.505127 0.51
4 2001-01-01 05:00:00 8.46 66.589996 13.640000 42.340000 12.505127 0.19
5 2018-04-30 20:00:00 63.00 58.000000 4.000000 2.000000 2.000000 0.30
6 2018-04-30 21:00:00 49.00 65.000000 4.000000 5.000000 4.000000 0.30
7 2018-04-30 22:00:00 49.00 58.000000 4.000000 5.000000 3.000000 0.30
8 2018-04-30 23:00:00 48.00 52.000000 4.000000 7.000000 7.000000 0.30
9 2018-05-01 00:00:00 52.00 43.000000 4.000000 6.000000 4.000000 0.30
我想根据一天的小时数计算每列的平均值。换句话说,对于2001年1月1日计算小时01到05的平均值。上面的df只是一个小例子,实际df每天最多运行24小时,尽管有时几天每小时的污染物较少阅读。一旦我计算了每一列的平均值,我就会计算每一行以获得标签。
df具有以下规格:
Index(['date', 'O_3', 'NO_2', 'SO_2', 'PM10', 'PM25', 'CO', 'Label'], dtype='object')
关于NaN值:
date 0
O_3 0
NO_2 0
SO_2 0
PM10 0
PM25 0
CO 0
Label 0
dtype: int64
关于一般信息:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 139608 entries, 0 to 139607
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 139608 non-null datetime64[ns]
1 O_3 139608 non-null float64
2 NO_2 139608 non-null float64
3 SO_2 139608 non-null float64
4 PM10 139608 non-null float64
5 PM25 139608 non-null float64
6 CO 139608 non-null float64
7 Label 139608 non-null float64
dtypes: datetime64[ns](1), float64(7)
为了将日期分组,我尝试了以下操作:
day_df = hour_df.groupby([hour_df.date.dt.strftime('%Y-%m-%d')]).mean()
但是我不确定这是否是正确的方法。如果我查看df信息,则会得到:
<class 'pandas.core.frame.DataFrame'>
Index: 5824 entries, 2001-01-01 to 2018-05-01
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 O_3 5824 non-null float64
1 NO_2 5824 non-null float64
2 SO_2 5824 non-null float64
3 PM10 5824 non-null float64
4 PM25 5824 non-null float64
5 CO 5824 non-null float64
6 Label 5824 non-null float64
dtypes: float64(7)
可以看出,并非全天都有24小时的污染物读数,否则,将只有6329个条目,而不是仅5824个条目。这就是为什么我不确定是否正确计算平均值的原因。
我真的很想知道针对我所寻找的东西的正确方法。
答案 0 :(得分:3)
将Arduino: 1.8.12 (Windows 10), Board: "NodeMCU 1.0 (ESP-12E Module), 80 MHz, Flash, Legacy (new can return nullptr), All SSL ciphers (most compatible), 4MB (FS:2MB OTA:~1019KB), 2, v2 Lower Memory, Disabled, None, Only Sketch, 115200"
In file included from C:\Users\mlfre\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\libraries\ESP8266WiFi\src/ESP8266WiFi.h:39:0,
from C:\Users\mlfre\OneDrive\Desktop\packetHandler\packetHandler.ino:23:
C:\Users\mlfre\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\libraries\ESP8266WiFi\src/WiFiClient.h:47:3: error: 'WiFiClient::WiFiClient(ClientContext*)' is protected
WiFiClient(ClientContext* client);
^
packetHandler:181:41: error: within this context
WiFiClient clients[CLIENT_COUNT] = {NULL};
^
In file included from C:\Users\mlfre\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\libraries\ESP8266WiFi\src/ESP8266WiFi.h:39:0,
from C:\Users\mlfre\OneDrive\Desktop\packetHandler\packetHandler.ino:23:
C:\Users\mlfre\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\libraries\ESP8266WiFi\src/WiFiClient.h: In function 'void updateClients()':
C:\Users\mlfre\AppData\Local\Arduino15\packages\esp8266\hardware\esp8266\2.6.3\libraries\ESP8266WiFi\src/WiFiClient.h:47:3: error: 'WiFiClient::WiFiClient(ClientContext*)' is protected
WiFiClient(ClientContext* client);
^
packetHandler:322:20: error: within this context
clients[i] = NULL;
^
exit status 1
within this context
列转换为熊猫date
列。然后,对datetime
和year
部分进行分组,忽略day
部分以获得hour
:
mean