是否有一种简单的python方法来填充缺失值的前第n个和后第n个

时间:2019-07-01 19:15:28

标签: python pandas numpy

我有一个数据集,其中某些时期是中间值,我想将此值复制到之前的第n行和之后的第n行。 在我的情况下,期间为15天,因此从中间值开始的时间为7天,之后为7天。我该怎么办?

好吧,我检查了很多书籍和网页,并且有很多关于fillna的参考文献,但是这些都不能解决我的问题。因此,我尚未尝试任何代码。

有我的数据集

    DATE        RAIN    RR_MIDDLE   CONDITION_RR    CONDITION_PR    SEASON
 0  1983-07-22  0.000   0.00        Dry             Dry            Dry_Season
 1  1983-07-23  NaN      NaN         NaN            NaN            NaN
 2  1983-07-24  NaN      NaN         NaN            NaN            NaN
 .....................................................................
15  1983-08-06  0.000   0.00         Wet            Wet            Wet_Season

我期望填充的表具有相同的值,例如一个季节中的中间一个(周期)。

 DATE           RAIN    RR_MIDDLE   CONDITION_RR    CONDITION_PR    SEASON
0   1983-07-22  0.000   0.00        Dry             Dry            Dry_Season
1   1983-07-23  0.000   0.00        Dry             Dry            Dry_Season
2   1983-07-24  0.000   0.00        Dry             Dry            Dry_Season
3   1983-07-25  0.000   0.00        Dry             Dry            Dry_Season
4   1983-07-26  0.000   0.00        Dry             Dry            Dry_Season
5   1983-07-27  0.000   0.00        Dry             Dry            Dry_Season
6   1983-07-28  0.000   0.00        Dry             Dry            Dry_Season
7   1983-07-29  0.000   0.00        Dry             Dry            Dry_Season
8   1983-07-30  0.000   0.00        Wet             Wet            Wet_Season
9   1983-07-31  0.000   0.00        Wet             Wet            Wet_Season
10  1983-08-01  0.000   0.00        Wet             Wet            Wet_Season
11  1983-08-02  0.000   0.00        Wet             Wet            Wet_Season
12  1983-08-03  0.000   0.00        Wet             Wet            Wet_Season
13  1983-08-04  0.000   0.00        Wet             Wet            Wet_Season
14  1983-08-05  0.000   0.00        Wet             Wet            Wet_Season
15  1983-08-06  0.000   0.00        Wet             Wet            Wet_Season
16  1983-08-07  0.000   0.00        Wet             Wet            Wet_Season
And so on.....

1 个答案:

答案 0 :(得分:1)

如果您事先知道要填充的NaN数,并且在整个数据集中都相同,那么最简单的解决方案是两个填充的limit参数:

df.ffill(limit=7).bfill(limit=7)

          DATE  RAIN  RR_MIDDLE CONDITION_RR CONDITION_PR      SEASON
0   1983-07-22   0.0        0.0          Dry          Dry  Dry_Season
1   1983-07-23   0.0        0.0          Dry          Dry  Dry_Season
2   1983-07-24   0.0        0.0          Dry          Dry  Dry_Season
3   1983-07-25   0.0        0.0          Dry          Dry  Dry_Season
4   1983-07-26   0.0        0.0          Dry          Dry  Dry_Season
5   1983-07-27   0.0        0.0          Dry          Dry  Dry_Season
6   1983-07-28   0.0        0.0          Dry          Dry  Dry_Season
7   1983-07-29   0.0        0.0          Dry          Dry  Dry_Season
8   1983-07-30   0.0        0.0          Wet          Wet  Wet_Season
9   1983-07-31   0.0        0.0          Wet          Wet  Wet_Season
10  1983-08-01   0.0        0.0          Wet          Wet  Wet_Season
11  1983-08-02   0.0        0.0          Wet          Wet  Wet_Season
12  1983-08-03   0.0        0.0          Wet          Wet  Wet_Season
13  1983-08-04   0.0        0.0          Wet          Wet  Wet_Season
14  1983-08-05   0.0        0.0          Wet          Wet  Wet_Season
15  1983-08-06   0.0        0.0          Wet          Wet  Wet_Season

否则,您需要interpolatenearest一起使用;但是,仅适用于数字类型。因此,我们需要变换每列,进行插值并变换回去。

str_cols = ['CONDITION_RR', 'CONDITION_PR', 'SEASON']

d = {}  # Holds mapping from str values to integers
for col in str_cols:
    u = df[col].dropna().unique()
    d[col] = dict(zip(u, range(len(u))))
    df[col] = df[col].map(d[col])  # Map unique values to integers

df = df.apply(pd.Series.interpolate, method='nearest')

# Map back
for col in str_cols:
    rev_d = {v:k for k,v in d[col].items()}
    df[col] = df[col].map(rev_d)