我有一个熊猫数据框,如下所示:
TIMESTAMP TAIR
0 2011-06-01 00:00:00 24.3
1 2011-06-01 00:05:00 24.5
2 2011-06-01 00:10:00 24.2
3 2011-06-01 00:15:00 24.1
4 2011-06-01 00:20:00 24.2
5 2011-06-01 00:25:00 -999
6 2011-06-01 00:30:00 15.1
7 2011-06-01 00:35:00 -999
8 2011-06-01 00:40:00 13.9
9 2011-06-01 00:45:00 13.7
我需要通过使用之前的值替换它来处理丢失的值(小于-990的任何值)。因此,如果我正确执行此操作,则新数据框将如下所示:
TIMESTAMP TEMP
0 2011-06-01 00:00:00 24.3
1 2011-06-01 00:05:00 24.5
2 2011-06-01 00:10:00 24.2
3 2011-06-01 00:15:00 24.1
4 2011-06-01 00:20:00 24.2
5 2011-06-01 00:25:00 24.2
6 2011-06-01 00:30:00 15.1
7 2011-06-01 00:35:00 15.1
8 2011-06-01 00:40:00 13.9
9 2011-06-01 00:45:00 13.7
时间戳是日期时间数据类型。
我现在的操作方式是使用for循环,如下所示:
for index, row in df.iterrows():
if row['TAIR'] < -990:
data.loc[index, 'TAIR'] = data.loc[index-1, 'TAIR']
是否有更好/更快的方法?
答案 0 :(得分:4)
mask
和ffill
:df.assign(TAIR=df.TAIR.mask(df.TAIR.le(-999)).ffill())
TIMESTAMP TAIR
0 2011-06-01 00:00:00 24.3
1 2011-06-01 00:05:00 24.5
2 2011-06-01 00:10:00 24.2
3 2011-06-01 00:15:00 24.1
4 2011-06-01 00:20:00 24.2
5 2011-06-01 00:25:00 24.2
6 2011-06-01 00:30:00 15.1
7 2011-06-01 00:35:00 15.1
8 2011-06-01 00:40:00 13.9
9 2011-06-01 00:45:00 13.7
答案 1 :(得分:2)
替换为np.nan
,然后使用ffill()
df.loc[df.TAIR <= -990, 'TAIR'] = np.nan
df.ffill()