Python /熊猫填充NaN值

时间:2020-09-06 21:34:24

标签: dataframe

出[2]: A组 0 1.0 2.0 5.0 1 1.0 5.0 7.0
2 2.0 3.0 6.0 3 2.0 NaN NaN 4 2.0 NaN NaN 5. 2.0 8.0 4.0

所需的输出:

出[2]: A组 0 1.0 2.0 5.0 1 1.0 5.0 7.0
2 2.0 3.0 6.0 3 2.0 6.0 7.0 4 2.0 7.0 8.0 5. 2.0 8.0 4.0

2 个答案:

答案 0 :(得分:1)

尝试:

blocks = df['GROUP'].ne(df['GROUP'].shift()).cumsum()
df['END'] = df['END'].fillna(df.fillna(1).groupby(blocks)['END'].cumsum()) 
df['START'] = df['START'].fillna(df['END'].shift())

答案 1 :(得分:0)

对于您的情况,没有内置的矢量化解决方案,但是您可以通过一次迭代和处理每个NaN部分来解决。

# initialize starting and ending values
df['START'] = df['START'].mask(df['START'].isna(), df['END'].shift())
df['END'] = df['END'].mask(df['END'].isna(), df['START'].shift(-1))

while df['END'].isna().any():
    i = df['END'].loc[df['END'].isna()].index[0] # get idx of first NaN
    k = df['END'].loc[i:].loc[~df['END'].isna()].index[0] # get idx of next valid
    if df.loc[i, 'GROUP'] != df.loc[k, 'GROUP']:
        # you did not specify what to do in case a group started or ended in NaN
        # this will replace with a temp string and later replace back to NaN
        df.loc[i:k, 'START':'END'] = 'temp'
        continue
    
    n = k - i + 1
    start = df.loc[i, 'START'] # get value
    end = df.loc[k, 'END'] # get value
    delta = (end - start) / n
    df.loc[i:k, 'START':'END'] = [
        [start + row * delta, start + (row + 1) * delta]
        for row in range(n)
    ]

df = df.replace('temp', np.nan)

输出

   GROUP  START  END
0      1    2.0  5.0
1      1    5.0  7.0
2      2    3.0  6.0
3      2    6.0  7.0
4      2    7.0  8.0
5      2    8.0  4.0

请注意,为了说明数据帧的第一行或最后一行为NaN,需要进行一些错误处理。