如何在熊猫中矢量化此操作?

时间:2019-04-24 13:24:12

标签: python pandas

SO社区,我正在使用具有以下结构的熊猫数据框。

结构:

index event_name    info
8469    OPTIONS 20404,400,113,117
8470    OPTIONS_SELECTION   117
8473    OPTIONS 437,436,114,117
8475    OPTIONS_SELECTION   437
8479    OPTIONS 121,451,444,407
8481    OPTIONS_SELECTION   407
8485    OPTIONS 121,404,413,412
8486    OPTIONS_SELECTION   121
8490    OPTIONS 437,436,434,431
8491    OPTIONS_SELECTION   437
8495    OPTIONS 121,444,451,407
8516    OPTIONS 400,20404,113,117
8522    OPTIONS 20404,400,113,117
8526    OPTIONS_SELECTION   400
8583    OPTIONS 437,436,118,114
8599    OPTIONS 11455,102951,1114,54533
8606    OPTIONS 101322,2831,101734,52172
8612    OPTIONS 3610,14863,105108,105589
8619    OPTIONS 103342,2992,101274,54723
8625    OPTIONS 52903,54486,102232,7246
8631    OPTIONS 7272,105106,102318,101730
8637    OPTIONS 91102,1041,189,104323
8643    OPTIONS 5114,90881,53032,105550
8659    OPTIONS 13627,20523,1115,11123
8673    OPTIONS 1336,122,1598,54495
8674    OPTIONS_SELECTION   1598
372321  OPTIONS 90992,104945,570,21465
372322  OPTIONS_SELECTION   90992
372325  OPTIONS 946,54670,1878,1293
272815  OPTIONS 538,52112,10574,104370
... ... ...
360010  OPTIONS 1268,885,2850,531
360011  OPTIONS_SELECTION   885
360014  OPTIONS 1268,531,2850,885
360015  OPTIONS_SELECTION   1268
360023  OPTIONS 1268,531,2850,884
360037  OPTIONS 1268,531,2850,884
510658  OPTIONS 105016,1535,1516,703
13999   OPTIONS_SELECTION   105008
401305  OPTIONS 4164,1503,863,873
401314  OPTIONS 4164,1503,863,866
8422    OPTIONS 4240,20448,11604,15538
8423    OPTIONS_SELECTION   4240
8428    OPTIONS 105072,104222,3698,16491
8429    OPTIONS_SELECTION   105072
821 OPTIONS 10045,90893,105294,4126
822 OPTIONS_SELECTION   105294
836 OPTIONS 852,5383,856,863
837 OPTIONS_SELECTION   852
840 OPTIONS 5383,852,856,863
841 OPTIONS_SELECTION   863
848 OPTIONS 852,5383,856,863
849 OPTIONS_SELECTION   863
874 OPTIONS 54933,52606,104234,1430
875 OPTIONS_SELECTION   1430
878 OPTIONS 20169,52469,3488,104645
879 OPTIONS_SELECTION   104645
882 OPTIONS 12486,884,1205,4349
894 OPTIONS 852,5383,863,856
895 OPTIONS_SELECTION   856
898 OPTIONS 2922,101769,53800,90939
56307 rows × 2 columns

我必须创建一个变量,其中包含用户在与聊天机器人的交互中所选择的选项。此变量由聊天机器人提供的菜单的选定选项的值(即event_name = OPTIONS_SELECTION)组成(即event_name = OPTIONS)。使用以下代码,可以获得所需的结果,但是由于数据集非常庞大,因此需要很长时间才能完成操作。

log_df['Selection'] = ''
for i in range(log_df.shape[0]):
    if i != log_df.shape[0] - 1:
        if log_df['event_name'].iloc[i]=="OPTIONS" and log_df['event_name'].iloc[i+1]=="OPTIONS_SELECTION":
            log_df['Selection'].iloc[i] = log_df['info'].iloc[i+1]

是否有更有效的方法来生成此变量?

1 个答案:

答案 0 :(得分:0)

使用where

log_df['Selection'] = log_df['info'].where(
                         log_df['event_name'] == 'OPTIONS_SELECTION', '').shift(-1)

保留事件为info的{​​{1}},然后向上移动。