Question

我有一个像这样的数据框：

df

col1    col2
 1        A 
 3        B
 6        A
 10       C

我想从df上方创建一个数据帧，如果col1值不连续，它将创建具有下一个col1值的另一行，而col2值将恰好是上面的值。

我要寻找的数据框应该是

df
col1    col2
 1        A
 2        A
 3        B
 4        B
 5        B
 6        A
 7        A
 8        A
 9        A
 10       C

我可以使用简单的for循环来做到这一点，但是有没有什么pythonic的方法可以使用pandas最有效地做到这一点？

Answer 1

这是使用statuscode和reindex和set_index()的一种方法：

ffill

df.set_index('col1').reindex(range(df.col1.min(),df.col1.max()+1)).ffill().reset_index()

#df.set_index('col1').reindex(range(df.col1.min(),df.col1.max()+1),method='ffill')\
                                                     #.reset_index()

Answer 2

一种方法是将reindex与ffill结合使用：

(df.set_index('col1')
   .reindex(range(df.col1.iloc[0], df.col1.iloc[-1]+1))
   .ffill()
   .reset_index())

    col1 col2
0     1    A
1     2    A
2     3    B
3     4    B
4     5    B
5     6    A
6     7    A
7     8    A
8     9    A
9    10    C

或者使用Series.repeat的另一种方式：

df.col2.repeat(df.col1.diff().shift(-1).fillna().reset_index(drop=True)

使用熊猫填充具有连续值的行以及位于上方的行

2 个答案: