如果列中有重复项,而下一列中有值,如何将这些值添加到重复值中

时间:2018-07-16 06:12:25

标签: python-3.x pandas

它们是带有列的数据框,如果用户名重复并且在所有者区域也应添加到重复项,则在下面输入

Queue    Owner Region   username
 xxy                     aan
 xyz      india           aan
 yyx                      aandiapp
 xox       UK            aandiapp
 yox      china       aashwins
 zxy                  aashwins
 yoz        aus       aasyed
 zxo                 aasyed

所需的输出应该是

Queue   Owner Region   username
 xxy      india        aan
 xyz    india          aan
 yyx       Uk          aandiapp
 xox       Uk         aandiapp
 yox      china       aashwins
 zxy      china       aashwins
 yoz        aus       aasyed
 zxo        aus       aasyed

请任何人帮助我,谢谢提前

2 个答案:

答案 0 :(得分:1)

我认为需要先将空值替换为NaN,然后再根据每个组的前后填充来替换它们:

df['Owner Region'] = df['Owner Region'].replace('', np.nan)
df['Owner Region'] = df.groupby('username')['Owner Region'].transform(lambda x: x.ffill().bfill())

答案 1 :(得分:1)

您可以使用maskgroupby

df['Owner Region'] = (
   df['Owner Region']
     .mask(df['Owner Region'].str.len().eq(0))
     .groupby(df.username)
     .ffill()
     .bfill())

df
  Queue Owner Region  username
0   xxy        india       aan
1   xyz        india       aan
2   yyx           UK  aandiapp
3   xox           UK  aandiapp
4   yox        china  aashwins
5   zxy        china  aashwins
6   yoz          aus    aasyed
7   zxo          aus    aasyed

呼叫groupby + ffill时,随后的bfill呼叫不需要分组。


如果一个组中可能只有NaN,则无法避免apply ...

df['Owner Region'] = (
   df['Owner Region']
        .mask(df['Owner Region'].str.len().eq(0))
       .groupby(df.username)
       .apply(lambda x: x.ffill().bfill()))

df
  Queue Owner Region  username
0   xxy        india       aan
1   xyz        india       aan
2   yyx           UK  aandiapp
3   xox           UK  aandiapp
4   yox        china  aashwins
5   zxy        china  aashwins
6   yoz          aus    aasyed
7   zxo          aus    aasyed