比较python pandas中df.apply和Column Operations的性能

时间:2018-02-01 11:58:45

标签: python performance pandas dataframe apply

我想知道使用数据帧的列执行基本算术运算是否可以通过列方式或通过应用更快地完成。特设,我认为列式更快。但两种方式都被考虑进行了矢量化和#39;操作。那么,df.apply可以快速比较吗?

1 个答案:

答案 0 :(得分:2)

我们可以尝试一下。下面的例子证明了列式操作(更快):

import numpy as np
import pandas as pd
from datetime import datetime


def applywise_duration(df):
    start_time = datetime.now()
    df['C'] = df.apply(lambda row: row['A'] + row['B'], axis=1)
    end_time = datetime.now()
    duration = end_time - start_time
    return(duration)

def columnwise_duration(df):
    start_time = datetime.now()
    df['C'] = df['A'] + df['B']
    end_time = datetime.now()
    duration = end_time - start_time
    return(duration)

df_apply = pd.DataFrame(
        np.random.randint(0,10000,size=(1000000, 2)),
        columns=list('AB')
)
df_vector = df_apply.copy()

applywise_duration = applywise_duration(df_apply)
columnwise_duration = columnwise_duration(df_vector)

print('Duration of apply: ', applywise_duration)
print('Duration of columnwise addition: ', columnwise_duration)
print('Ratio: ', columnwise_duration / applywise_duration)
print('That means, in this case, columnwise addition is %s times faster '
        'than addition via apply!'
        % str(applywise_duration / columnwise_duration)
      )

Thsis在我的机器上提供以下内容:

Duration of apply:  0:00:23.631236
Duration of columnwise addition:  0:00:00.004234
Ratio:  0.00017916963801639492
That means, columnwise addition is 5581.302786962683 times faster than addition via apply!