我有这样的DF:
Name Food Year_eaten Month_eaten
Maria Rice 2014 3
Maria Rice 2015 NaN
Maria Rice 2016 NaN
Jack Steak 2011 NaN
Jack Steak 2012 5
Jack Steak 2013 NaN
我希望输出看起来像这样:
Name Food Year_eaten Month_eaten
Maria Rice 2014 3
Maria Rice 2015 3
Maria Rice 2016 3
Jack Steak 2011 5
Jack Steak 2012 5
Jack Steak 2013 5
我要根据以下条件填写NaN:
If the row's Name, Food is the same and the Year's are consecutive:
Fill the NaN's with the Month_eaten corresponding to the row that isn't a NaN
会有一个人在Month_eaten拥有所有的NaN,但我现在不必担心。在任何年份中,只有Month_eaten至少具有一个值的人。
任何想法都将不胜感激!
答案 0 :(得分:3)
您可以对“名称”,“食物”和通过diff
对“ Year_eaten”行进行创建而创建的自定义列进行分组。
u = df.Year_eaten.diff().bfill().ne(1).cumsum()
v = df.groupby(['Name','Food', v]).Month_eaten.transform('first')
df['Month_eaten'] = df.Month_eaten.fillna(v, downcast='infer')
df
Name Food Year_eaten Month_eaten
0 Maria Rice 2014 3
1 Maria Rice 2015 3
2 Maria Rice 2016 3
3 Jack Steak 2011 5
4 Jack Steak 2012 5
5 Jack Steak 2013 5
如果 no 组的所有行均带有NaN,则另一种解决方案是使用groupby
和ffill
(其他都相同)。
df['Month_eaten'] = df.groupby(['Name','Food', u]).Month_eaten.ffill().bfill()
df
Name Food Year_eaten Month_eaten
0 Maria Rice 2014 3
1 Maria Rice 2015 3
2 Maria Rice 2016 3
3 Jack Steak 2011 5
4 Jack Steak 2012 5
5 Jack Steak 2013 5
答案 1 :(得分:1)
使用diff().ne(1).cumsum()
创建年份组键
continueyear=df.groupby(['Name','Food']).Year_eaten.apply(lambda x : x.diff().ne(1).cumsum())
然后将groupby
与apply
ffill
和bfill
一起使用
df.groupby([df.Name,df.Food,continueyear]).Month_eaten.apply(lambda x : x.ffill().bfill().astype(int))
Out[26]:
0 3
1 3
2 3
3 5
4 5
5 5
Name: Month_eaten, dtype: int32