我正在尝试从utm_source列中为每个anonymous_id查找第一个非空值,并创建一个名为first的新列,并将其标记为第一个非空值。
我之前曾问过一个类似的问题,然后发现我可以使用.first()获得第一个非null值。但是,我很难将此值分配给新列。
这是我的代码:
first_two = pd.DataFrame(file[file['steps'] == 'Sign-ups'].sort_values(by=['ts']).groupby(['anonymous_id','year']).transform(lambda x: x['first'] == x['utm_source'].first()))
当我尝试运行它时,出现以下错误消息:
KeyError :(“第一个”,“发生在未命名的索引:0”)
这是我正在使用的数据的示例:
{'steps': {0: 'Sign-ups',
1: nan,
2: nan,
3: nan,
4: nan,
5: nan,
6: nan,
7: nan,
8: nan,
9: nan},
'utm_source': {0: nan,
1: 'facebook',
2: 'facebook',
3: nan,
4: nan,
5: nan,
6: nan,
7: nan,
8: nan,
9: nan},
'ts': {0: Timestamp('2018-04-11 06:59:20.206000'),
1: Timestamp('2019-05-18 05:59:11.874000'),
2: Timestamp('2018-09-10 18:19:25.260000'),
3: Timestamp('2017-10-11 08:20:18.092000'),
4: Timestamp('2017-10-11 08:20:31.466000'),
5: Timestamp('2017-10-11 08:20:37.345000'),
6: Timestamp('2017-10-11 08:21:01.322000'),
7: Timestamp('2017-10-11 08:21:14.145000'),
8: Timestamp('2017-10-11 08:23:47.526000'),
9: Timestamp('2019-06-12 10:42:50.401000')},
'anonymous_id': {0: '0000f8ea-3aa6-4423-9247-1d9580d378e1',
1: '00015d49-2cd8-41b1-bbe7-6aedbefdb098',
2: '0002226e-26a4-4f55-9578-2eff2999de7e',
3: '00022b83-240e-4ef9-aaad-ac84064bb902',
4: '00022b83-240e-4ef9-aaad-ac84064bb902',
5: '00022b83-240e-4ef9-aaad-ac84064bb902',
6: '00022b83-240e-4ef9-aaad-ac84064bb902',
7: '00022b83-240e-4ef9-aaad-ac84064bb902',
8: '00022b83-240e-4ef9-aaad-ac84064bb902',
9: '0002ed69-4aff-434d-a626-fc9b20ef1b02'},
'year': {0: 2018,
1: 2019,
2: 2018,
3: 2017,
4: 2017,
5: 2017,
6: 2017,
7: 2017,
8: 2017,
9: 2019}}
注意:我将数据框转换为字典,以便每个人都可以轻松查看并与数据进行交互
我的预期输出的一个例子是
anonymous_id utm_source first year
1111 Facebook Facebook 2017
1234 NaN NaN 2017
1243 Google Google 2018
要重申的是,“第一”列将根据在utm_source中找到的第一个non_null值(第一个匿名名被点击的广告)标记
答案 0 :(得分:0)
如果我对您的理解正确,我们可以将groupby
与first_valid_index
结合使用:
df.loc[df.groupby('anonymous_id')['utm_source'].apply(lambda x: x.first_valid_index())]\
.dropna(subset=['utm_source'])
输出
steps utm_source ts anonymous_id year
1.0 NaN facebook 2019-05-18 05:59:11.874 00015d49-2cd8-41b1-bbe7-6aedbefdb098 2019.0
2.0 NaN facebook 2018-09-10 18:19:25.260 0002226e-26a4-4f55-9578-2eff2999de7e 2018.0