使用索引与使用数据效率来合并熊猫数据帧

时间:2018-12-11 15:05:37

标签: python pandas merge benchmarking

使用索引合并看起来比使用数据合并效率低。我对吗?这有道理吗?

import pandas as pd
import numpy as np
import timeit
rows = 10**5
data = np.random.randint(0,100,size=(rows,3))
df1 = pd.DataFrame(data,columns=['A','B','C'])
df1.sort(columns=['A','B'],inplace=True)
data = np.random.randint(0,100,size=(rows,3))
df2 = pd.DataFrame(data,columns=['A','B','C'])
df2.sort(columns=['A','B'],inplace=True)
times = 10
print 'merge by data: {}'.format(
    timeit.timeit(
        stmt="df1.merge(df2,on=['A','B'],how='outer')",
        setup='from __main__ import df1,df2',number=times)/times)

df1.set_index(['A','B'],inplace=True)
df2.set_index(['A','B'],inplace=True)

print 'merge by index: {}'.format(
    timeit.timeit(
        stmt="df1.merge(df2,right_index=True,left_index=True,how='outer')",
        setup='from __main__ import df1,df2',number=times)/times)
>>> merge by data: 0.0498219966888
>>> merge by index: 0.133691716194

0 个答案:

没有答案