Question

这是我的代码和警告信息。如果我使用s将Series更改为独立s = pd.Series(np.random.randn(5))，则不会出现此类错误。在Windows上使用Python 2.7。

似乎系列是从独立创建的，而从数据框的列创建的系列是不同的行为？感谢。

我的目的是更改系列值本身，而不是更改副本。

源代码，

import pandas as pd

sample = pd.read_csv('123.csv', header=None, skiprows=1,
       dtype={0:str, 1:str, 2:str, 3:float})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('int64')
s = sample['c_d']
#s = pd.Series(np.random.randn(5))
for i in range(len(s)):
    if s.iloc[i] > 0:
        s.iloc[i] = s.iloc[i] + 1
    else:
        s.iloc[i] = s.iloc[i] - 1

警告信息，

C:\Python27\lib\site-packages\pandas\core\indexing.py:132: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)

内容123.csv ，

c_a,c_b,c_c,c_d
hello,python,numpy,0.0
hi,python,pandas,1.0
ho,c++,vector,0.0
ho,c++,std,1.0
go,c++,std,0.0

编辑1 ，似乎lambda解决方案不起作用，试图在之前和之后打印s，相同的值，

import pandas as pd

sample = pd.read_csv('123.csv', header=None, skiprows=1,
       dtype={0:str, 1:str, 2:str, 3:float})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('int64')
s = sample['c_d']
print s
s.apply(lambda x:x+1 if x>0 else x-1)
print s

0    0
1    1
2    0
3    1
4    0
Name: c_d, dtype: int64
Backend TkAgg is interactive backend. Turning interactive mode on.
0    0
1    1
2    0
3    1
4    0

的问候，林

Answer 1

我建议你改用apply函数：

s.apply(lambda x:x+1 if x>0 else x-1)

Answer 2

通过执行s = sample['c_d']，如果您对s的值进行了更改，那么原始数据框sample也会发生变化。这就是你收到警告的原因。

您可以改为s = sample[c_d].copy()，因此更改s的值不会更改Dataframe c_d的{{1}}列的值。

pandas独立系列和来自数据帧的不同行为

2 个答案: