根据值从熊猫数据框中的特定位置获取最后N行

时间:2020-07-30 18:15:15

标签: python python-3.x pandas numpy dataframe

我有一个类似的日期集

Sno change  date
0   NaN 2017-01-01
1   NaN 2017-02-01
2   NaN 2017-03-01
3   NaN 2017-04-01
4   NaN 2017-05-01
5   NaN 2017-06-01
6   NaN 2017-07-01
7   NaN 2017-08-01
8   0.0 2017-09-01
9   NaN 2017-10-01
10  NaN 2017-11-01
11  1   2017-12-01
12  NaN 2018-01-01
13  NaN 2018-02-01

当“更改”列中的值从NaN更改为其他值时,我想获取数据帧中“日期”列的最后5行。因此,对于本示例,它将分为两组:

Sno    date
3   2017-04-01
4   2017-05-01
5   2017-06-01
6   2017-07-01
7   2017-08-01
8   2017-09-01

Sno    date
6   2017-07-01
7   2017-08-01
8   2017-09-01
9   2017-10-01
10  2017-11-01
11  2017-12-01

有人可以帮我得到这个吗?谢谢

2 个答案:

答案 0 :(得分:1)

您可以使用isna()检查NaN values, then np。其中to extract the locations of last row, finally, np.r_`用于创建切片:

s = df.change.isna()

valids = np.where(s.shift() & (~s))[0]

[df.iloc[np.r_[x-5:x]] for x in valid]

[   Sno  change        date
 3    3     NaN  2017-04-01
 4    4     NaN  2017-05-01
 5    5     NaN  2017-06-01
 6    6     NaN  2017-07-01
 7    7     NaN  2017-08-01,
     Sno  change        date
 6     6     NaN  2017-07-01
 7     7     NaN  2017-08-01
 8     8     0.0  2017-09-01
 9     9     NaN  2017-10-01
 10   10     NaN  2017-11-01]

答案 1 :(得分:1)

您可以尝试使用locisna这样的事情:

#df=df.set_index('Sno')
idxs=df.index[~df.change.isna()]
sets=[df.loc[i-5:i,['date']] for i in idxs]

输出:

sets
[           date
 Sno            
 3    2017-04-01
 4    2017-05-01
 5    2017-06-01
 6    2017-07-01
 7    2017-08-01
 8    2017-09-01,

            date
 Sno            
 6    2017-07-01
 7    2017-08-01
 8    2017-09-01
 9    2017-10-01
 10   2017-11-01
 11   2017-12-01]