DataFrame.loc()对数据框造成了什么影响?

时间:2016-10-02 22:07:58

标签: python numpy

我使用numpy.random.permutation()为原始数据框X生成随机顺序,并希望通过随机顺序将整个X分配给X_perm。

X_perm=X
y_perm=y
perm = np.random.permutation(X.shape[0])
for i in range(len(perm)):
  X_perm.loc[i]=(X.loc[perm[i]])
  y_perm.loc[i]=(y.loc[perm[i]])

刚刚发现运行代码后,X [0:1]给出的X的第一条记录与运行前的情况相比发生了变化。

奇怪。我没有对X进行任何操作,而是将其值分配给新的数据帧。它是如何导致X值的改变的? 干杯

1 个答案:

答案 0 :(得分:0)

出现这种意外行为的原因是X_perm不是一个独立于X的数组.X_perm是对X的引用。因此对X_perm的修改也是对X的修改。

为了证明这一点:

import numpy as np
a = np.arange(16)
print a
b = a  # as your X_perm = X
print b  # same as print a above
b[0] = -999
print a  # has been modified
print b  # has been modified

a[-1] = -999
print a  # has been modified
print b  # has been modified

# using copy
a = np.arange(16)
print a
b = a.copy()  # b is separate reference to array
print b  # same as print a above
b[0] = -999
print a  # has NOT been modified
print b  # has been modified

a[-1] = -999
print a  # has been modified
print b  # has NOT been modified

要做你想做的事,你需要将X_perm作为X的副本。

X_perm = X.copy()

另见this relevant numpy doc on copy