(编辑:根据要求更新了更多数据)
我有一个包含几千行的pandas数据框,如下所示:
>>> y=x.groupby('wbdqueue_id')
>>> y.head()
id jname orderid wbdqueue_id platform_id \
59 1341127 ondemand_build_baspen-w7g 15 26581 1341122.0
60 1341126 ondemand_qa_qforchecka 41 26581 1341125.0
61 1341125 ondemand_build_bchecka 17 26581 1341123.0
63 1341123 ondemand_build_baspen-w7f 14 26581 1341122.0
64 1341122 ondemand_update_waspen-w7a 2 26581 1341073.0
116 1340927 qa_db_insertall 56 26578 NaN
117 1340926 ondemand_qa_qca-pc20rha 39 26578 1340925.0
118 1340925 ondemand_build_bca-pc20rha 16 26578 1340924.0
119 1340924 ondemand_update_wca-pc20rha 3 26578 1340871.0
120 1340923 ondemand_qa_qaspen-w7_qa2 35 26578 1340922.0
173 1340870 qa_db_insertall 56 26577 NaN
174 1340869 ondemand_qa_qtopia 52 26577 1340868.0
175 1340868 ondemand_build_btopia 33 26577 1340867.0
176 1340867 ondemand_update_wtopia 9 26577 1340814.0
177 1340866 ondemand_qa_qmoed 47 26577 1340865.0
230 1340813 qa_db_insertall 56 26576 NaN
231 1340812 ondemand_qa_qmoeb 46 26576 1340811.0
232 1340811 ondemand_build_bmoed 22 26576 1340810.0
233 1340810 ondemand_build_bmoee 23 26576 1340809.0
234 1340809 ondemand_update_wmoeb 5 26576 1340757.0
287 1340756 qa_db_insertall 56 26575 NaN
293 1340750 ondemand_qa_qca-pc20rha 39 26575 1340749.0
294 1340749 ondemand_build_bca-pc20rha 16 26575 1340748.0
295 1340748 ondemand_update_wca-pc20rha 3 26575 1340700.0
296 1340747 ondemand_qa_qvmwin7-64i 55 26575 1340746.0
344 1340699 qa_db_insertall 56 26574 NaN
345 1340698 ondemand_qa_qslotha 51 26574 1340697.0
346 1340697 ondemand_build_bmousef 28 26574 1340684.0
347 1340696 ondemand_qa_qboarb 38 26574 1340695.0
348 1340695 ondemand_build_bmouseg 29 26574 1340684.0
... ... ... ... ... ...
9659 1327031 qa_db_insertall 56 26311 NaN
9660 1327030 ondemand_qa_qforchecka 41 26311 1327029.0
9661 1327029 ondemand_build_bchecka 17 26311 1327027.0
9662 1327028 ondemand_qa_qaspen-w7_qa1 34 26311 1327027.0
9663 1327027 ondemand_build_baspen-w7f 14 26311 1327024.0
9716 1326974 qa_db_insertall 56 26310 NaN
9717 1326973 ondemand_qa_qmoeb 46 26310 1326972.0
9718 1326972 ondemand_build_bmoed 22 26310 1326971.0
9719 1326971 ondemand_build_bmoee 23 26310 1326970.0
9720 1326970 ondemand_update_wmoeb 5 26310 1326918.0
9773 1326917 qa_db_insertall 56 26309 NaN
9774 1326916 ondemand_qa_qtopia 52 26309 1326915.0
9775 1326915 ondemand_build_btopia 33 26309 1326914.0
9776 1326914 ondemand_update_wtopia 9 26309 1326861.0
9777 1326913 ondemand_qa_qaspen-w7_qa2 35 26309 1326912.0
9830 1326860 qa_db_insertall 56 26308 NaN
9831 1326859 ondemand_build_balder-w7d 12 26308 1326852.0
9832 1326858 ondemand_qa_qvmwin7-64i 55 26308 1326857.0
9833 1326857 ondemand_build_balder-w7h 13 26308 1326852.0
9834 1326856 ondemand_qa_qvmwin7-64d 54 26308 1326855.0
9887 1326803 qa_db_insertall 56 26307 NaN
9888 1326802 ondemand_qa_qaspen-w7_qa1 34 26307 1326799.0
9889 1326801 ondemand_qa_qforchecka 41 26307 1326800.0
9890 1326800 ondemand_build_bchecka 17 26307 1326799.0
9891 1326799 ondemand_build_baspen-w7f 14 26307 1326796.0
9944 1326746 qa_db_insertall 56 26306 NaN
9950 1326740 ondemand_qa_qkrakena 43 26306 1326737.0
9951 1326739 ondemand_qa_qkirina 42 26306 1326738.0
9952 1326738 ondemand_build_bkirina 18 26306 1326737.0
9953 1326737 ondemand_build_bkrakena 19 26306 1326736.0
startdatetime enddatetime runtime
59 2017-07-31 23:14:56 2017-07-31 23:19:12 00:04:16
60 2017-07-31 23:15:35 2017-07-31 23:34:12 00:18:37
61 2017-07-31 23:14:56 2017-07-31 23:15:30 00:00:34
63 2017-07-31 23:10:05 2017-07-31 23:14:56 00:04:51
64 2017-07-31 23:09:32 2017-07-31 23:10:00 00:00:28
116 2017-07-31 21:42:28 2017-07-31 21:42:55 00:00:27
117 2017-07-31 21:10:15 2017-07-31 21:17:46 00:07:31
118 2017-07-31 21:09:37 2017-07-31 21:10:10 00:00:33
119 2017-07-31 21:09:22 2017-07-31 21:09:32 00:00:10
120 2017-07-31 21:17:57 2017-07-31 21:33:22 00:15:25
173 2017-07-31 21:05:15 2017-07-31 21:05:17 00:00:02
174 2017-07-31 20:27:07 2017-07-31 20:27:19 00:00:12
175 2017-07-31 20:27:00 2017-07-31 20:27:02 00:00:02
176 2017-07-31 20:26:52 2017-07-31 20:26:54 00:00:02
177 2017-07-31 20:48:56 2017-07-31 20:48:59 00:00:03
230 2017-07-31 21:04:50 2017-07-31 21:05:15 00:00:25
231 2017-07-31 20:20:02 2017-07-31 20:23:24 00:03:22
232 2017-07-31 20:19:43 2017-07-31 20:19:57 00:00:14
233 2017-07-31 20:19:11 2017-07-31 20:19:38 00:00:27
234 2017-07-31 20:17:26 2017-07-31 20:18:49 00:01:23
287 2017-07-31 20:58:36 2017-07-31 20:59:00 00:00:24
293 2017-07-31 20:08:57 2017-07-31 20:16:33 00:07:36
294 2017-07-31 20:07:44 2017-07-31 20:08:52 00:01:08
295 2017-07-31 20:07:06 2017-07-31 20:07:39 00:00:33
296 2017-07-31 20:36:23 2017-07-31 20:36:58 00:00:35
344 2017-07-31 20:38:11 2017-07-31 20:38:35 00:00:24
345 2017-07-31 19:58:32 2017-07-31 20:00:04 00:01:32
346 2017-07-31 19:57:48 2017-07-31 19:58:26 00:00:38
347 2017-07-31 19:59:17 2017-07-31 20:04:59 00:05:42
348 2017-07-31 19:58:26 2017-07-31 19:59:12 00:00:46
... ... ... ...
9659 2017-07-18 18:44:05 2017-07-18 18:44:29 00:00:24
9660 2017-07-18 18:08:17 2017-07-18 18:25:53 00:17:36
9661 2017-07-18 18:07:35 2017-07-18 18:08:12 00:00:37
9662 2017-07-18 18:07:40 2017-07-18 18:15:01 00:07:21
9663 2017-07-18 18:03:20 2017-07-18 18:07:35 00:04:15
9716 2017-07-18 18:10:53 2017-07-18 18:11:15 00:00:22
9717 2017-07-18 17:40:52 2017-07-18 17:44:15 00:03:23
9718 2017-07-18 17:40:29 2017-07-18 17:40:48 00:00:19
9719 2017-07-18 17:39:51 2017-07-18 17:40:25 00:00:34
9720 2017-07-18 17:37:42 2017-07-18 17:39:23 00:01:41
9773 2017-07-18 16:45:15 2017-07-18 16:45:39 00:00:24
9774 2017-07-18 16:02:44 2017-07-18 16:06:24 00:03:40
9775 2017-07-18 16:02:09 2017-07-18 16:02:39 00:00:30
9776 2017-07-18 16:01:55 2017-07-18 16:02:04 00:00:09
9777 2017-07-18 16:10:46 2017-07-18 16:26:18 00:15:32
9830 2017-07-18 16:10:22 2017-07-18 16:10:46 00:00:24
9831 2017-07-18 15:41:47 2017-07-18 15:44:07 00:02:20
9832 2017-07-18 16:01:19 2017-07-18 16:01:55 00:00:36
9833 2017-07-18 15:44:07 2017-07-18 15:46:02 00:01:55
9834 2017-07-18 15:41:52 2017-07-18 15:42:36 00:00:44
9887 2017-07-18 15:02:56 2017-07-18 15:03:20 00:00:24
9888 2017-07-18 14:34:26 2017-07-18 14:41:15 00:06:49
9889 2017-07-18 14:35:03 2017-07-18 14:51:17 00:16:14
9890 2017-07-18 14:34:21 2017-07-18 14:34:58 00:00:37
9891 2017-07-18 14:30:03 2017-07-18 14:34:22 00:04:19
9944 2017-07-18 13:50:01 2017-07-18 13:50:25 00:00:24
9950 2017-07-18 13:18:38 2017-07-18 13:39:37 00:20:59
9951 2017-07-18 13:19:35 2017-07-18 13:24:42 00:05:07
9952 2017-07-18 13:18:38 2017-07-18 13:19:30 00:00:52
9953 2017-07-18 13:17:16 2017-07-18 13:18:33 00:01:17
我想为每个wbdqueue_id获取ondemand_build_baspen-w7g的ondemand_update_waspen-w7a和 enddatetime 的 startdatetime 之间的区别。在Stackoverflow的帮助下,我学会了如何处理groupby和apply(myfunc)或lambda。 以下是提出的解决方案,其中任何一个都可以很好地工作,如果不是缺少数据:
df.set_index('jname').groupby('wbdqueue_id').apply(
lambda x: x.at['ondemand_update_waspen-w7a', 'startdatetime'] \
- x.at['ondemand_build_baspen-w7f', 'enddatetime'] )
和
def get_time_diff(dff):
start_time = dff[dff.jname.eq('ondemand_update_waspen-w7a')].startdatetime.values[0]
end_time = dff[dff.jname.eq('ondemand_build_baspen-w7g')].enddatetime.values[0]
return pd.Timedelta(end_time - start_time)
但似乎并非所有组都有ondemand_update_waspen-w7a和/或ondemand_build_baspen-w7g。这会导致上述功能失败。 如何跳过或删除没有我需要的数据的组?似乎没有一种简单的方法可以做到这一点。
答案 0 :(得分:1)
通过添加try-except
来处理缺少指定字符串的组来调整函数:
def get_time_diff(dff):
try:
start_time = dff[dff.jname.eq('ondemand_update_waspen-w7a')].startdatetime.values[0]
end_time = dff[dff.jname.eq('ondemand_build_baspen-w7g')].enddatetime.values[0]
return pd.Timedelta(end_time - start_time)
except (KeyError, IndexError):
return 0
出于调试目的,我在执行索引的函数部分周围放置了try-except
语句。如果您正在使用的字符串丢失,则每个人都会报告。我还编辑了函数参数,以确保您可以选择切换字符串。这将允许您在函数中输入任何随机字符串以进行测试:
def get_time_diff2(dff, str1='ondemand_update_waspen-w7a', str2='ondemand_build_baspen-w7g'):
missing = ""
try:
start_time = dff[dff.jname.eq(str1)].startdatetime.values[0]
except (KeyError, IndexError):
missing += "{} is not in startdatetime. ".format(str1)
try:
end_time = dff[dff.jname.eq(str2)].enddatetime.values[0]
except (KeyError, IndexError):
missing += "{} is not in enddatetime".format(str2)
if missing:
return missing
return pd.Timedelta(end_time - start_time)
如果您对数据框和两个字符串使用此函数,例如" hello"和" world",它会告诉您是否在您应用get_time_diff
功能的特定组中缺少这两个或两个。请参阅以下内容:
df.groupby('wbdqueue_id').apply(get_time_diff2, 'hello', 'world')
应该返回:
# wbdqueue_id
# 26578 hello is not in startdatetime. world is not in enddatetime
# 26581 hello is not in startdatetime. world is not in enddatetime
# dtype: object
如果只缺少其中一个:
df.groupby('wbdqueue_id').apply(get_time_diff2, 'ondemand_update_waspen-w7a', 'hello')
应该返回:
wbdqueue_id
26578 hello is not in enddatetime
26581 hello is not in enddatetime
dtype: object
我希望这会有所帮助。