有谁知道为什么pandas对象copy()
方法似乎比重建对象慢得多?有没有理由在标准构造函数上使用copy()
方法?
这是一个快速的结果:
In [42]: import pandas as pd
In [43]: df = pd.DataFrame(np.random.rand(300000).reshape(100000,3), columns=list('ABC'))
In [44]: %timeit pd.DataFrame(df)
The slowest run took 5.61 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.95 µs per loop
In [45]: %timeit df.copy()
The slowest run took 5.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 390 µs per loop
复制操作之间的差异也会延续到pandas系列。 有趣的是,numpy数组不会表现出相同类型的行为,例如:
In [48]: import numpy as np
In [49]: myarray = np.random.rand(300000)
In [50]: %timeit myarray.copy()
10000 loops, best of 3: 162 µs per loop
In [52]: %timeit np.array(myarray)
10000 loops, best of 3: 168 µs per loop
答案 0 :(得分:3)
这是因为副本实际上创建了DataFrame的新内部表示,而使用构造函数只指向同一个:
var clickEvent = new MouseEvent('click', {
'view': window,
'bubbles': true,
'cancelable': true
});
document.querySelector('li[data="1"]').dispatchEvent(clickEvent);
document.querySelector('span.clickspan').dispatchEvent(clickEvent);
一个必然结果是,如果你改变了原始的DataFrame,它将改变df2而不是df1(副本):
In [11]: df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])
In [12]: id(df._data) # internal attribute, don't futz with it!
Out[12]: 4472136472
In [13]: df1 = df.copy()
In [14]: id(df1._data) # different object
Out[14]: 4472572448
In [15]: df2 = pd.DataFrame(df)
In [16]: id(df2._data) # same as df._data
Out[16]: 4472136472
这就是您要使用副本的原因!
在numpy 两个副本和构造函数中复制:
In [21]: df.iloc[0, 0] = 99
In [22]: df
Out[22]:
A B
0 99 2
1 3 4
In [23]: df1
Out[23]:
A B
0 1 2
1 3 4
In [24]: df2
Out[24]:
A B
0 99 2
1 3 4