假设我有以下与公司面板数据相对应的数据框。变量entry
指定公司何时进入市场,在此变量上我想创建一个队列(随着时间的推移跟踪公司)。有什么办法可以执行这样的代码吗? (基本上变量cohort
应该在entry = 1
从此数据框
id year entry
1 2009 0
1 2012 1
1 2013 0
1 2014 0
2 2010 1
2 2011 0
2 2012 0
3 2007 0
3 2008 0
3 2012 1
3 2013 0
我需要传递类似这样的内容
id year entry cohort
1 2009 0 NaN
1 2012 1 2012
1 2013 0 2012
1 2014 0 2012
2 2010 1 2010
2 2011 0 2010
2 2012 0 2010
3 2007 0 NaN
3 2008 0 NaN
3 2012 1 2012
3 2013 0 2012
非常感谢,对不起我的英语,不是英语(我和python一样练习)
答案 0 :(得分:2)
将year
保持在entry
等于1的位置:
df.year.where(df.entry == 1)
#0 NaN
#1 2012.0
#2 NaN
#3 NaN
#4 2010.0
#5 NaN
#6 NaN
#7 NaN
#8 NaN
#9 2012.0
#10 NaN
#Name: year, dtype: float64
然后使用groupby + ffill
:
df["cohort"] = df.year.where(df.entry == 1).groupby(df.id).ffill()
df
# id year entry cohort
#0 1 2009 0 NaN
#1 1 2012 1 2012.0
#2 1 2013 0 2012.0
#3 1 2014 0 2012.0
#4 2 2010 1 2010.0
#5 2 2011 0 2010.0
#6 2 2012 0 2010.0
#7 3 2007 0 NaN
#8 3 2008 0 NaN
#9 3 2012 1 2012.0
#10 3 2013 0 2012.0
答案 1 :(得分:0)
IIUC
df['cphort']=df.year.mask(df.entry==0).groupby(df.id).ffill()
df
Out[202]:
id year entry cphort
0 1 2009 0 NaN
1 1 2012 1 2012.0
2 1 2013 0 2012.0
3 1 2014 0 2012.0
4 2 2010 1 2010.0
5 2 2011 0 2010.0
6 2 2012 0 2010.0
7 3 2007 0 NaN
8 3 2008 0 NaN
9 3 2012 1 2012.0
10 3 2013 0 2012.0