Question

我试图根据条件列在DataFrame中乘以一行。

例如，当条件列中的值为2时，我希望用两个相同的行替换该行，并将每个新行中的条件设置为1.

示例DataFrame：

df = pd.DataFrame({'k': ['K0', 'K1', 'K1', 'K2'],
              'condition': [1, 1, 3, 2],
              's': ['a', 'b', 'c', 'd']})


    condition   k  s
            1  K0  a
            1  K1  b
            3  K1  c
            2  K2  d

期望的结果：

  condition   k  s
          1  K0  a
          1  K1  b
          1  K1  c
          1  K1  c
          1  K1  c  
          1  K2  d
          1  K2  d

是否可以有效地完成此操作inplace，而无需创建临时df？

Answer 1

使用loc和np.repeat时更快：

df = df.loc[np.repeat(df.index.values,df.condition)].reset_index(drop=True)
df['condition'] = 1
print df
   condition   k  s
0          1  K0  a
1          1  K1  b
2          1  K1  c
3          1  K1  c
4          1  K1  c
5          1  K2  d
6          1  K2  d

groupby concat的另一个解决方案condition以及1列中df = df.groupby('condition', as_index=False, sort=False) .apply(lambda x: pd.concat([x]*x.condition.values[0], ignore_index=True)) .reset_index(drop=True) df['condition'] = 1 print df condition k s 0 1 K0 a 1 1 K1 b 2 1 K1 c 3 1 K1 c 4 1 K1 c 5 1 K2 d 6 1 K2 d的最后设定值，但速度较慢：

In [917]: %timeit df.loc[np.repeat(df.index.values,df.condition)].reset_index(drop=True)
The slowest run took 4.55 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 1.04 ms per loop

In [918]: %timeit df.groupby('condition', as_index=False, sort=False).apply(lambda x: pd.concat([x]*x.condition.values[0], ignore_index=True)).reset_index(drop=True)
100 loops, best of 3: 7.78 ms per loop

<强>计时：

emp_num trans_date  day_type
5667    2016-03-01  1
5667    2016-03-02  1
5667    2016-03-03  1
5667    2016-03-04  3
5667    2016-03-05  3
5667    2016-03-06  1
5667    2016-03-07  1
5667    2016-03-08  1
5667    2016-03-09  1
5667    2016-03-10  1
5667    2016-03-11  3
5667    2016-03-12  3
5667    2016-03-13  1
5667    2016-03-14  1
5667    2016-03-15  1
5667    2016-03-16  1
5667    2016-03-17  1
5667    2016-03-18  3
5667    2016-03-19  3
5667    2016-03-20  1
5667    2016-03-21  1
5667    2016-03-22  1
5667    2016-03-23  1
5667    2016-03-24  1
5667    2016-03-25  3
5667    2016-03-26  3
5667    2016-03-27  1
5667    2016-03-28  1
5667    2016-03-29  1
5667    2016-03-30  1
5667    2016-03-31  1

pandas：根据条件有效地进行多重播放行

1 个答案: