SO社区,我正在使用具有以下结构的熊猫数据框。
结构:
index event_name info
8469 OPTIONS 20404,400,113,117
8470 OPTIONS_SELECTION 117
8473 OPTIONS 437,436,114,117
8475 OPTIONS_SELECTION 437
8479 OPTIONS 121,451,444,407
8481 OPTIONS_SELECTION 407
8485 OPTIONS 121,404,413,412
8486 OPTIONS_SELECTION 121
8490 OPTIONS 437,436,434,431
8491 OPTIONS_SELECTION 437
8495 OPTIONS 121,444,451,407
8516 OPTIONS 400,20404,113,117
8522 OPTIONS 20404,400,113,117
8526 OPTIONS_SELECTION 400
8583 OPTIONS 437,436,118,114
8599 OPTIONS 11455,102951,1114,54533
8606 OPTIONS 101322,2831,101734,52172
8612 OPTIONS 3610,14863,105108,105589
8619 OPTIONS 103342,2992,101274,54723
8625 OPTIONS 52903,54486,102232,7246
8631 OPTIONS 7272,105106,102318,101730
8637 OPTIONS 91102,1041,189,104323
8643 OPTIONS 5114,90881,53032,105550
8659 OPTIONS 13627,20523,1115,11123
8673 OPTIONS 1336,122,1598,54495
8674 OPTIONS_SELECTION 1598
372321 OPTIONS 90992,104945,570,21465
372322 OPTIONS_SELECTION 90992
372325 OPTIONS 946,54670,1878,1293
272815 OPTIONS 538,52112,10574,104370
... ... ...
360010 OPTIONS 1268,885,2850,531
360011 OPTIONS_SELECTION 885
360014 OPTIONS 1268,531,2850,885
360015 OPTIONS_SELECTION 1268
360023 OPTIONS 1268,531,2850,884
360037 OPTIONS 1268,531,2850,884
510658 OPTIONS 105016,1535,1516,703
13999 OPTIONS_SELECTION 105008
401305 OPTIONS 4164,1503,863,873
401314 OPTIONS 4164,1503,863,866
8422 OPTIONS 4240,20448,11604,15538
8423 OPTIONS_SELECTION 4240
8428 OPTIONS 105072,104222,3698,16491
8429 OPTIONS_SELECTION 105072
821 OPTIONS 10045,90893,105294,4126
822 OPTIONS_SELECTION 105294
836 OPTIONS 852,5383,856,863
837 OPTIONS_SELECTION 852
840 OPTIONS 5383,852,856,863
841 OPTIONS_SELECTION 863
848 OPTIONS 852,5383,856,863
849 OPTIONS_SELECTION 863
874 OPTIONS 54933,52606,104234,1430
875 OPTIONS_SELECTION 1430
878 OPTIONS 20169,52469,3488,104645
879 OPTIONS_SELECTION 104645
882 OPTIONS 12486,884,1205,4349
894 OPTIONS 852,5383,863,856
895 OPTIONS_SELECTION 856
898 OPTIONS 2922,101769,53800,90939
56307 rows × 2 columns
我必须创建一个变量,其中包含用户在与聊天机器人的交互中所选择的选项。此变量由聊天机器人提供的菜单的选定选项的值(即event_name = OPTIONS_SELECTION)组成(即event_name = OPTIONS)。使用以下代码,可以获得所需的结果,但是由于数据集非常庞大,因此需要很长时间才能完成操作。
log_df['Selection'] = ''
for i in range(log_df.shape[0]):
if i != log_df.shape[0] - 1:
if log_df['event_name'].iloc[i]=="OPTIONS" and log_df['event_name'].iloc[i+1]=="OPTIONS_SELECTION":
log_df['Selection'].iloc[i] = log_df['info'].iloc[i+1]
是否有更有效的方法来生成此变量?
答案 0 :(得分:0)
使用where
:
log_df['Selection'] = log_df['info'].where(
log_df['event_name'] == 'OPTIONS_SELECTION', '').shift(-1)
保留事件为info
的{{1}},然后向上移动。