使用具有不同float32和float64数据类型的Pandas DataFrame进行的赋值对于某些组合来说相当缓慢。
下面的代码设置一个DataFrame,对部分数据进行Numpy / Scipy计算,通过复制旧DataFrame来设置一个新的DataFrame,并将计算结果分配给新的DataFrame:
import pandas as pd
import numpy as np
from scipy.signal import lfilter
N = 1000
M = 1000
def f(dtype1, dtype2):
coi = [str(m) for m in range(M)]
df = pd.DataFrame([[m for m in range(M)] + ['Hello', 'World'] for n in range(N)],
columns=coi + ['A', 'B'], dtype=dtype1)
Y = lfilter([1], [0.5, 0.5], df.ix[:, coi])
Y = Y.astype(dtype2)
new = pd.DataFrame(df, copy=True)
print(new.iloc[0, 0].dtype)
print(Y.dtype)
new.ix[:, coi] = Y # This statement is considerably slow
print(new.iloc[0, 0].dtype)
from time import time
dtypes = [np.float32, np.float64]
for dtype1 in dtypes:
for dtype2 in dtypes:
print('-' * 10)
start_time = time()
f(dtype1, dtype2)
print(time() - start_time)
时间结果是:
----------
float32
float32
float64
10.1998147964
----------
float32
float64
float64
10.2371120453
----------
float64
float32
float64
0.864870071411
----------
float64
float64
float64
0.866265058517
这里的关键线是new.ix[:, coi] = Y
:对于某些组合来说,这是十倍慢。
我可以理解,当存在float32 DataFrame并且为其分配了float64时,需要一些重新分配的开销。但为什么开销如此戏剧化。
此外,float32和float32赋值的组合也很慢,结果是float64,这也困扰我。
答案 0 :(得分:0)
单列赋值不会改变类型,并且对于非类型转换赋值而言,使用for-loop over columns进行迭代似乎相当快, - float32和float64。对于涉及类型转换的赋值,性能通常是多列分配的最差性能的两倍
import pandas as pd
import numpy as np
from scipy.signal import lfilter
N = 1000
M = 1000
def f(dtype1, dtype2):
coi = [str(m) for m in range(M)]
df = pd.DataFrame([[m for m in range(M)] + ['Hello', 'World'] for n in range(N)],
columns=coi + ['A', 'B'], dtype=dtype1)
Y = lfilter([1], [0.5, 0.5], df.ix[:, coi])
Y = Y.astype(dtype2)
new = df.copy()
print(new.iloc[0, 0].dtype)
print(Y.dtype)
for n, column in enumerate(coi): # For-loop over columns new!
new.ix[:, column] = Y[:, n]
print(new.iloc[0, 0].dtype)
from time import time
dtypes = [np.float32, np.float64]
for dtype1 in dtypes:
for dtype2 in dtypes:
print('-' * 10)
start_time = time()
f(dtype1, dtype2)
print(time() - start_time)
结果是:
----------
float32
float32
float32
0.809890985489
----------
float32
float64
float64
21.4767119884
----------
float64
float32
float32
20.5611870289
----------
float64
float64
float64
0.765362977982