出[2]:
A组
0 1.0 2.0 5.0
1 1.0 5.0 7.0
2 2.0 3.0 6.0
3 2.0 NaN NaN
4 2.0 NaN NaN
5. 2.0 8.0 4.0
所需的输出:
出[2]:
A组
0 1.0 2.0 5.0
1 1.0 5.0 7.0
2 2.0 3.0 6.0
3 2.0 6.0 7.0
4 2.0 7.0 8.0
5. 2.0 8.0 4.0
答案 0 :(得分:1)
尝试:
blocks = df['GROUP'].ne(df['GROUP'].shift()).cumsum()
df['END'] = df['END'].fillna(df.fillna(1).groupby(blocks)['END'].cumsum())
df['START'] = df['START'].fillna(df['END'].shift())
答案 1 :(得分:0)
对于您的情况,没有内置的矢量化解决方案,但是您可以通过一次迭代和处理每个NaN
部分来解决。
# initialize starting and ending values
df['START'] = df['START'].mask(df['START'].isna(), df['END'].shift())
df['END'] = df['END'].mask(df['END'].isna(), df['START'].shift(-1))
while df['END'].isna().any():
i = df['END'].loc[df['END'].isna()].index[0] # get idx of first NaN
k = df['END'].loc[i:].loc[~df['END'].isna()].index[0] # get idx of next valid
if df.loc[i, 'GROUP'] != df.loc[k, 'GROUP']:
# you did not specify what to do in case a group started or ended in NaN
# this will replace with a temp string and later replace back to NaN
df.loc[i:k, 'START':'END'] = 'temp'
continue
n = k - i + 1
start = df.loc[i, 'START'] # get value
end = df.loc[k, 'END'] # get value
delta = (end - start) / n
df.loc[i:k, 'START':'END'] = [
[start + row * delta, start + (row + 1) * delta]
for row in range(n)
]
df = df.replace('temp', np.nan)
输出
GROUP START END
0 1 2.0 5.0
1 1 5.0 7.0
2 2 3.0 6.0
3 2 6.0 7.0
4 2 7.0 8.0
5 2 8.0 4.0
请注意,为了说明数据帧的第一行或最后一行为NaN
,需要进行一些错误处理。