如何仅在熊猫中有x个连续的非空值的地方传递值?

时间:2019-10-22 04:23:31

标签: python pandas time-series

我有一个时间序列的每月温度异常数据是60年。我只想传递温度序列中温度异常大于0.5的连续六个月或更长时间的温度值。尽管我发现用NaN替换<0.5的值很容易,但是我不确定如何替换温度> 0.5的值,但是只有2或3个连续的值大于0.5。下面的代码段:

time = [1950.04167, 1950.125  , 1950.20833, 1950.29167, 1950.375  ,
       1950.45833, 1950.54167, 1950.625  , 1950.70833, 1950.79167,
       1950.875  , 1950.95833, 1951.04167, 1951.125  , 1951.20833,
       1951.29167, 1951.375  , 1951.45833, 1951.54167, 1951.625  ,
       1951.70833, 1951.79167, 1951.875  , 1951.95833, 1952.04167,
       1952.125  , 1952.20833, 1952.29167, 1952.375  , 1952.45833,
       1952.54167, 1952.625  , 1952.70833, 1952.79167, 1952.875  ,
       1952.95833, 1953.04167, 1953.125  , 1953.20833, 1953.29167,
       1953.375  , 1953.45833, 1953.54167, 1953.625  , 1953.70833,
       1953.79167, 1953.875  , 1953.95833, 1954.04167, 1954.125  ,
       1954.20833, 1954.29167, 1954.375  , 1954.45833, 1954.54167,
       1954.625  , 1954.70833, 1954.79167, 1954.875  , 1954.95833]


sst = [-1.67623 , -1.685853, -1.69083 , -1.61898 , -1.40235 ,
       -1.097773, -0.835867, -0.718727, -0.694087, -0.785423,
       -0.9312  , -1.01925 , -0.8868  , -0.48022 , -0.007597,
        0.448647,  0.66546 ,  0.852427, 0.922443,  1.14481 ,
        1.291153,  1.338903,  0.993053,  0.68006, 0.493597,
        0.500197,  0.528363,  0.515583,  0.418493,  0.168387,
       -0.003403,  0.033933,  0.15759 ,  0.113847,  0.019967,
        0.111413, 0.372967,  0.623067,  0.763903,  0.909743,
        0.990287,  1.01288 , 0.969407,  0.985817,  0.982607,
        1.01244 ,  1.039917,  1.11755, 1.044333,  0.799593,
        0.3769  ,  0.105033, -0.070743, -0.281483, -0.59861,
        -0.875743, -0.88768 , -0.642517, -0.548043, -0.547057]


series = pd.Series(index=time,data=sst)
greater = series.where(cond=(series>= 0.5))

因此,例如,我希望能够“传递”与1951.375至1951.95833和1953.125至1954.125时间跨度相对应的SST值,其中对于8个和13个连续值,SST分别大于0.5,但是用NaN替换SST值,以获取对应于1952.125至1952.29167的SST值,其中只有3个连续的值> 0.5。

有什么建议吗? TIA!

1 个答案:

答案 0 :(得分:0)

您可以使用> 0.5找到series.groupby(series.le(0.5).cumsum())游程的长度,然后使用.apply()将值替换为过短的游程。

.groupby最终将最后一个<= 0.5值汇总在一起,因此我们希望将其限制为大于等于5的整数,并用np.nan替换第一个值。

In [61]: (
    series
    .groupby(series.le(0.5).cumsum())
    .apply(lambda x: pd.Series(np.nan if len(x) < 5 else [np.nan] + list(x)[1:], x.index))
)
Out[61]:
1950.04167         NaN
1950.12500         NaN
1950.20833         NaN
1950.29167         NaN
1950.37500         NaN
1950.45833         NaN
1950.54167         NaN
1950.62500         NaN
1950.70833         NaN
1950.79167         NaN
1950.87500         NaN
1950.95833         NaN
1951.04167         NaN
1951.12500         NaN
1951.20833         NaN
1951.29167         NaN
1951.37500    0.665460
1951.45833    0.852427
1951.54167    0.922443
1951.62500    1.144810
1951.70833    1.291153
1951.79167    1.338903
1951.87500    0.993053
1951.95833    0.680060
1952.04167         NaN
1952.12500         NaN
1952.20833         NaN
1952.29167         NaN
1952.37500         NaN
1952.45833         NaN
1952.54167         NaN
1952.62500         NaN
1952.70833         NaN
1952.79167         NaN
1952.87500         NaN
1952.95833         NaN
1953.04167         NaN
1953.12500    0.623067
1953.20833    0.763903
1953.29167    0.909743
1953.37500    0.990287
1953.45833    1.012880
1953.54167    0.969407
1953.62500    0.985817
1953.70833    0.982607
1953.79167    1.012440
1953.87500    1.039917
1953.95833    1.117550
1954.04167    1.044333
1954.12500    0.799593
1954.20833         NaN
1954.29167         NaN
1954.37500         NaN
1954.45833         NaN
1954.54167         NaN
1954.62500         NaN
1954.70833         NaN
1954.79167         NaN
1954.87500         NaN
1954.95833         NaN
dtype: float64