Question

我有这样的DF：

Name      Food      Year_eaten      Month_eaten

Maria     Rice        2014               3
Maria     Rice        2015              NaN
Maria     Rice        2016              NaN
Jack      Steak       2011              NaN
Jack      Steak       2012               5
Jack      Steak       2013              NaN

我希望输出看起来像这样：

Name      Food      Year_eaten      Month_eaten

Maria     Rice        2014               3
Maria     Rice        2015               3
Maria     Rice        2016               3
Jack      Steak       2011               5
Jack      Steak       2012               5
Jack      Steak       2013               5

我要根据以下条件填写NaN：

If the row's Name, Food is the same and the Year's are consecutive:
     Fill the NaN's with the Month_eaten corresponding to the row that isn't a NaN

会有一个人在Month_eaten拥有所有的NaN，但我现在不必担心。在任何年份中，只有Month_eaten至少具有一个值的人。

任何想法都将不胜感激！

Answer 1

您可以对“名称”，“食物”和通过diff对“ Year_eaten”行进行创建而创建的自定义列进行分组。

u = df.Year_eaten.diff().bfill().ne(1).cumsum()
v = df.groupby(['Name','Food', v]).Month_eaten.transform('first')

df['Month_eaten'] = df.Month_eaten.fillna(v, downcast='infer')

df
    Name   Food  Year_eaten  Month_eaten
0  Maria   Rice        2014            3
1  Maria   Rice        2015            3
2  Maria   Rice        2016            3
3   Jack  Steak        2011            5
4   Jack  Steak        2012            5
5   Jack  Steak        2013            5

如果 no 组的所有行均带有NaN，则另一种解决方案是使用groupby和ffill（其他都相同）。

df['Month_eaten'] = df.groupby(['Name','Food', u]).Month_eaten.ffill().bfill()
df
    Name   Food  Year_eaten  Month_eaten
0  Maria   Rice        2014            3
1  Maria   Rice        2015            3
2  Maria   Rice        2016            3
3   Jack  Steak        2011            5
4   Jack  Steak        2012            5
5   Jack  Steak        2013            5

Answer 2

使用diff().ne(1).cumsum()创建年份组键

continueyear=df.groupby(['Name','Food']).Year_eaten.apply(lambda x : x.diff().ne(1).cumsum())

然后将groupby与apply ffill和bfill一起使用

df.groupby([df.Name,df.Food,continueyear]).Month_eaten.apply(lambda x : x.ffill().bfill().astype(int))
Out[26]:
0    3
1    3
2    3
3    5
4    5
5    5
Name: Month_eaten, dtype: int32

根据熊猫的另一列和行填写NaN值

2 个答案: