Question

我想将Pandas中DataFrame开头的最后一个NaN归零。我的DataFrame对象有时间戳。

示例数据

如果我有这些数据：

In [228]: my_df
Out[228]: 
            blah
1990-01-01   NaN
1990-01-02   NaN
1990-01-03   NaN
1990-01-04   NaN
1990-01-05   NaN
1990-01-06     5
1990-01-07     6
1990-01-08     7
1990-01-09   NaN
1990-01-10     9

[10 rows x 1 columns]

我想得到以下内容（更改1月5日的值）：

            blah
1990-01-01   NaN
1990-01-02   NaN
1990-01-03   NaN
1990-01-04   NaN
1990-01-05     0
1990-01-06     5
1990-01-07     6
1990-01-08     7
1990-01-09   NaN
1990-01-10     9

[10 rows x 1 columns]

我尝试了什么

我可以在最后一个NaN之后得到索引：

In [229]: ts = my_df['blah'].first_valid_index()

In [230]: ts
Out[230]: Timestamp('1990-01-06 00:00:00', tz=None)

我发现了这种丑陋的做法：

my_df['blah'][:ts][-2] = 0

但是，如果我的DataFrame在开始时没有任何NaN，则会抛出IndexError。更好的解决方案会是什么样的（大概不用只写for循环）？

Answer 1

也许只需使用IndexError处理try..except：

try:
    df.loc[:ts, 'blah'][-2] = 0
except IndexError:
    pass

或if-statement：

s = df.loc[:ts, 'blah']
if len(s) > 1: 
    s[-2] = 0

由于:ts是基本切片，s是一个视图。因此，修改s会修改df。

如何在带有Timestamp索引的DataFrame中找到上一行？

1 个答案: