我在下面有以下df1,显示hhmm次。这些值表示文字时间,但格式不正确。例如。 845应该是08:45,而1125 = 11:25。
CU Parameters 31-07-2017 01-08-2017 02-08-2017 03-08-2017
CU0111-039820-L Time of Full Charge 1125 0 1359 1112
CU0111-041796-H Time of Full Charge 1233 0 0 1135
CU0111-046907-0 Time of Full Charge 845 0 1229 1028
CU0111-046933-6 Time of Full Charge 1053 0 0 1120
CU0111-050103-K Time of Full Charge 932 0 1314 1108
CU0111-052525-J Time of Full Charge 1214 1424 1307 1254
CU0111-052534-M Time of Full Charge 944 0 0 1128
CU0111-052727-7 Time of Full Charge 1136 0 1443 1114
我需要将所有这些值转换为hh:mm的有效时间戳,然后计算这些时间戳的平均值,不包括' 0'的值。
CU Parameters 31-07-2017 01-08-2017 02-08-2017 03-08-2017
CU0111-039820-L Time of Full Charge 11:25 0 13:59 11:12
CU0111-041796-H Time of Full Charge 12:33 0 0 11:35
CU0111-046907-0 Time of Full Charge 08:45 0 12:29 10:28
CU0111-046933-6 Time of Full Charge 10:53 0 0 11:20
CU0111-050103-K Time of Full Charge 09:32 0 13:14 11:08
CU0111-052525-J Time of Full Charge 12:14 14:24 13:07 12:54
CU0111-052534-M Time of Full Charge 09:44 0 0 11:28
CU0111-052727-7 Time of Full Charge 11:36 0 14:43 11:14
最终结果:
Average time of charge: hh:hh (excluding 0 values)
Number of no charges: =count(number of 0)
我尝试了这些方面的东西,但无济于事:
text = df1[col_list].astype(str)
df1[col_list] = text.str[:-2] + ':' + text.str[-2:]
hhmm = df1[col_list]
minutes = (hhmm / 100).astype(int) * 60 + hhmm % 100
df[col_list] = pd.to_timedelta(minutes, 'm')
答案 0 :(得分:2)
我认为您可以先转换所有值to_timedelta
:
cols = df.columns.difference(['CU','Parameters'])
df[cols] = df[cols].replace(0, '0000')
.astype(str)
.apply(lambda x: pd.to_timedelta(x.str[:-2] + ':' + x.str[-2:] + ':00'))
print (df)
CU Parameters 31-07-2017 01-08-2017 02-08-2017 \
0 CU0111-039820-L Time of Full Charge 11:25:00 00:00:00 13:59:00
1 CU0111-041796-H Time of Full Charge 12:33:00 00:00:00 00:00:00
2 CU0111-046907-0 Time of Full Charge 08:45:00 00:00:00 12:29:00
3 CU0111-046933-6 Time of Full Charge 10:53:00 00:00:00 00:00:00
4 CU0111-050103-K Time of Full Charge 09:32:00 00:00:00 13:14:00
5 CU0111-052525-J Time of Full Charge 12:14:00 14:24:00 13:07:00
6 CU0111-052534-M Time of Full Charge 09:44:00 00:00:00 00:00:00
7 CU0111-052727-7 Time of Full Charge 11:36:00 00:00:00 14:43:00
03-08-2017
0 11:12:00
1 11:35:00
2 10:28:00
3 11:20:00
4 11:08:00
5 12:54:00
6 11:28:00
7 11:14:00
然后为平均值而不是null timedeltas创建新列,并将0
计为True
值的总和:
df['avg'] = df[cols][df[cols].ne(0)].mean(axis=1)
df['number no changes'] = df[cols].eq(0).sum(axis=1)
print (df)
CU Parameters 31-07-2017 01-08-2017 02-08-2017 \
0 CU0111-039820-L Time of Full Charge 11:25:00 00:00:00 13:59:00
1 CU0111-041796-H Time of Full Charge 12:33:00 00:00:00 00:00:00
2 CU0111-046907-0 Time of Full Charge 08:45:00 00:00:00 12:29:00
3 CU0111-046933-6 Time of Full Charge 10:53:00 00:00:00 00:00:00
4 CU0111-050103-K Time of Full Charge 09:32:00 00:00:00 13:14:00
5 CU0111-052525-J Time of Full Charge 12:14:00 14:24:00 13:07:00
6 CU0111-052534-M Time of Full Charge 09:44:00 00:00:00 00:00:00
7 CU0111-052727-7 Time of Full Charge 11:36:00 00:00:00 14:43:00
03-08-2017 avg number no changes
0 11:12:00 12:12:00 1
1 11:35:00 12:04:00 2
2 10:28:00 10:34:00 1
3 11:20:00 11:06:30 2
4 11:08:00 11:18:00 1
5 12:54:00 13:09:45 0
6 11:28:00 10:36:00 2
7 11:14:00 12:31:00 1
print (df[cols][df[cols].ne(0)])
01-08-2017 02-08-2017 03-08-2017 31-07-2017
0 NaT 13:59:00 11:12:00 11:25:00
1 NaT NaT 11:35:00 12:33:00
2 NaT 12:29:00 10:28:00 08:45:00
3 NaT NaT 11:20:00 10:53:00
4 NaT 13:14:00 11:08:00 09:32:00
5 14:24:00 13:07:00 12:54:00 12:14:00
6 NaT NaT 11:28:00 09:44:00
7 NaT 14:43:00 11:14:00 11:36:00
print (df[cols].eq(0))
01-08-2017 02-08-2017 03-08-2017 31-07-2017
0 True False False False
1 True True False False
2 True False False False
3 True True False False
4 True False False False
5 False False False False
6 True True False False
7 True False False False