数据::
Unnamed: 0 gvkey date CUSIP conm tic cik PERMNO COMNAM
0
0 1001 1983 00016510 A & M FOOD SERVICES INC AMFD. 723576.0 10015 NaN
1
1 1001 1983 00016510 A & M FOOD SERVICES INC AMFD. 723576.0 10015 A & M FOOD SERVICES INC
2
5 1001 1984 00016510 A & M FOOD SERVICES INC AMFD. 723576.0 10015 A & M FOOD SERVICES INC
3
17 1001 1985 00016510 A & M FOOD SERVICES INC AMFD. 723576.0 10015 A & M FOOD SERVICES INC
4
29 1003 1983 00035410 A.A. IMPORTING CO INC ANTQ 730052.0 10031 NaN
目标::
获取当年特定观测值的PERMNO(数据)
条件::
例如;年= 1983,gvkey = 1001,下一年= 1984
我尝试过的::
df = DATA
df[(df['date'] == year) & (df['date'] == gvkey) & (df[df['date'] == next_year]['COMNAM'].isna() != 1])]
但是,它返回,没有观察到。
我认为这是因为代码包含两个互斥条件:df ['date'] == year和df ['date'] == next_year
有人可以给我建议吗?谢谢!
答案 0 :(得分:0)
pd.Series.isna
返回系列,而不是布尔值。重要的是,由于您首先通过df[df['date'] == next_year]
应用了过滤器,因此布尔序列的索引将与前两个掩码不同。
您应避免使用链式索引,即explicitly discouraged in the docs。相反,您可以查找范围内的年份,然后使用pd.Series.isin
。最后,出于可读性考虑,我建议您结合使用多个遮罩:
m1 = df['date'].eq(year)
m2 = df['gvkey'].eq(gvkey)
viable_years = df.loc[m2 & df['COMNAM'].notnull(), 'date'].values # returns in-scope years
m3 = (df['date'] + 1).isin(viable_years) # check next year is a good year
res = df[m1 & m2 & m3]