import pandas as pd
import time
df = pd.DataFrame({'Current': [1, 3, -4, 9, -3, 1, -2]})
# Method-1
start1 = time.time()
neg_index1 = df[(df["Current"]<0)].index.tolist()
print(neg_index1)
end1 = time.time()
print("Method-1 time is = ",end1 - start1)
# Method-2
start2 = time.time()
neg_index2 = df.iloc[df["Current"].lt(0).values].index.tolist()
print(neg_index2)
end2 = time.time()
print("Method-2 time is = ",end2 - start2)
在第一次执行时,方法2的输出更快:
[2, 4, 6]
Method-1 time is = 0.002000093460083008
[2, 4, 6]
Method-2 time is = 0.0009999275207519531
第二次执行时输出,有趣的是两个时钟同时显示:
[2, 4, 6]
Method-1 time is = 0.0009999275207519531
[2, 4, 6]
Method-2 time is = 0.0009999275207519531
第四次执行时的输出,令人惊讶的是方法1在这里更快:
[2, 4, 6]
Method-1 time is = 0.0009999275207519531
[2, 4, 6]
Method-2 time is = 0.0019998550415039062
一些解释并帮助您了解哪种方法更快?
答案 0 :(得分:2)
我更愿意使用np.where
:
np.where(df['Current']<0)[0].tolist()
也不要使用time.time
来使用timeit
:
import pandas as pd, numpy as np
import timeit
df = pd.DataFrame({'Current': [1, 3, -4, 9, -3, 1, -2]})
# Method-1
neg_index1 = df[(df["Current"]<0)].index.tolist()
print(neg_index1)
print("Method-1 time is = ",timeit.timeit(lambda: df[(df["Current"]<0)].index.tolist(),number=10))
# Method-2
neg_index2 = df.iloc[df["Current"].lt(0).values].index.tolist()
print(neg_index2)
print("Method-2 time is = ",timeit.timeit(lambda: df.iloc[df["Current"].lt(0).values].index.tolist(),number=10))
# Method-3
neg_index2 = np.where(df['Current']<0)[0].tolist()
print(neg_index2)
print("Method-3 time is = ",timeit.timeit(lambda: np.where(df['Current']<0)[0].tolist(),number=10))
输出:
[2, 4, 6]
Method-1 time is = 0.0211404744016608
[2, 4, 6]
Method-2 time is = 0.02377961247025239
[2, 4, 6]
Method-3 time is = 0.007515077367731743
所以np.where
赢得了重要的胜利!
答案 1 :(得分:0)
在测量每次执行所花费的时间时,可能还有其他进程在消耗资源。也可能有随机收集的垃圾收集器启动,从而扭曲了结果。因此,切勿使用time.time()
来比较效果。
使用timeit.timeit
来衡量效果。它将代码运行重复几次,并测量每次运行所花费的平均时间,从而提供更准确的结果。它还在运行期间禁用垃圾收集。