按面板数据的id选择最后一行并使用np.where

时间:2017-11-10 15:55:22

标签: python python-3.x pandas numpy

我想按id选择最后一个位置并检查变量fecha是否大于252,以便在np.where中使用它? / p>

        id     clae6  year    quarter   fecha   fecha_dif2   position 
         1  475230.0  2007          1     220          -1       1
         1  475230.0  2007          2     221          -1       2
         1  475230.0  2007          3     222          -1       3
         1  475230.0  2007          4     223          -1       4 
         1  475230.0  2008          1     224          -1       5
         2  475230.0  2007          1     220          -1       1
         2  475230.0  2007          2     221          -1       2
         2  475230.0  2007          3     222          -1       3
         2  475230.0  2007          4     223          -1       4
         2  475230.0  2008          1     224          -1       5
         3  475230.0  2010          1     232          -1       1
         3  475230.0  2010          2     233          -1       2
         3  475230.0  2010          3     234          -1       3 
         3  475230.0  2010          4     235          -1       4
         3  475230.0  2011          1     236          -1       5
         3  475230.0  2011          2     237          -1       6

2 个答案:

答案 0 :(得分:2)

没有groupby

df.drop_duplicates(['id'],keep='last').fecha.gt(252)
Out[213]: 
4     False
9     False
15    False
Name: fecha, dtype: bool

df['fechatest']=df.drop_duplicates(['id'],keep='last').fecha.gt(252)
df.fillna(False)
Out[216]: 
    id     clae6  year  quarter  fecha  fecha_dif2  position  fechatest
0    1  475230.0  2007        1    220          -1         1      False
1    1  475230.0  2007        2    221          -1         2      False
2    1  475230.0  2007        3    222          -1         3      False
3    1  475230.0  2007        4    223          -1         4      False
4    1  475230.0  2008        1    224          -1         5      False
5    2  475230.0  2007        1    220          -1         1      False
6    2  475230.0  2007        2    221          -1         2      False
7    2  475230.0  2007        3    222          -1         3      False
8    2  475230.0  2007        4    223          -1         4      False
9    2  475230.0  2008        1    224          -1         5      False
10   3  475230.0  2010        1    232          -1         1      False
11   3  475230.0  2010        2    233          -1         2      False
12   3  475230.0  2010        3    234          -1         3      False
13   3  475230.0  2010        4    235          -1         4      False
14   3  475230.0  2011        1    236          -1         5      False
15   3  475230.0  2011        2    237          -1         6      False

答案 1 :(得分:0)

首先使用groupbytail,然后比较:

mask = df.groupby('id')['fecha'].tail(1) > 252
#same as
#mask = df.groupby('id')['fecha'].tail(1).gt(252)
print (mask)
4     False
9     False
15    False
Name: fecha, dtype: bool

如果需要与df相同尺寸的面具添加reindex

m = df.groupby('id')['fecha'].tail(1).gt(252).reindex(df.index, fill_value=False)
print (m)

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
Name: fecha, dtype: bool
df['new'] = np.where(m, 'yes', 'no')
print (df)
    id     clae6  year  quarter  fecha  fecha_dif2  position new
0    1  475230.0  2007        1    220          -1         1  no
1    1  475230.0  2007        2    221          -1         2  no
2    1  475230.0  2007        3    222          -1         3  no
3    1  475230.0  2007        4    223          -1         4  no
4    1  475230.0  2008        1    224          -1         5  no
5    2  475230.0  2007        1    220          -1         1  no
6    2  475230.0  2007        2    221          -1         2  no
7    2  475230.0  2007        3    222          -1         3  no
8    2  475230.0  2007        4    223          -1         4  no
9    2  475230.0  2008        1    224          -1         5  no
10   3  475230.0  2010        1    232          -1         1  no
11   3  475230.0  2010        2    233          -1         2  no
12   3  475230.0  2010        3    234          -1         3  no
13   3  475230.0  2010        4    235          -1         4  no
14   3  475230.0  2011        1    236          -1         5  no
15   3  475230.0  2011        2    237          -1         6  no