它们是带有列的数据框,如果用户名重复并且在所有者区域也应添加到重复项,则在下面输入
Queue Owner Region username
xxy aan
xyz india aan
yyx aandiapp
xox UK aandiapp
yox china aashwins
zxy aashwins
yoz aus aasyed
zxo aasyed
所需的输出应该是
Queue Owner Region username
xxy india aan
xyz india aan
yyx Uk aandiapp
xox Uk aandiapp
yox china aashwins
zxy china aashwins
yoz aus aasyed
zxo aus aasyed
请任何人帮助我,谢谢提前
答案 0 :(得分:1)
我认为需要先将空值替换为NaN
,然后再根据每个组的前后填充来替换它们:
df['Owner Region'] = df['Owner Region'].replace('', np.nan)
df['Owner Region'] = df.groupby('username')['Owner Region'].transform(lambda x: x.ffill().bfill())
答案 1 :(得分:1)
您可以使用mask
和groupby
。
df['Owner Region'] = (
df['Owner Region']
.mask(df['Owner Region'].str.len().eq(0))
.groupby(df.username)
.ffill()
.bfill())
df
Queue Owner Region username
0 xxy india aan
1 xyz india aan
2 yyx UK aandiapp
3 xox UK aandiapp
4 yox china aashwins
5 zxy china aashwins
6 yoz aus aasyed
7 zxo aus aasyed
呼叫groupby
+ ffill
时,随后的bfill
呼叫不需要分组。
如果一个组中可能只有NaN,则无法避免apply
...
df['Owner Region'] = (
df['Owner Region']
.mask(df['Owner Region'].str.len().eq(0))
.groupby(df.username)
.apply(lambda x: x.ffill().bfill()))
df
Queue Owner Region username
0 xxy india aan
1 xyz india aan
2 yyx UK aandiapp
3 xox UK aandiapp
4 yox china aashwins
5 zxy china aashwins
6 yoz aus aasyed
7 zxo aus aasyed