我正在将pandas对象列复制到一个单独的有序列中,但是我收到警告,并且终生都没有想出如何正确地进行操作。
我无法发布整个数据框架,但这是我使用的语法:
marriage_cat_type = CategoricalDtype(categories= ['M_22', 'M_23', 'M_24', 'M_25', 'M_26', 'M_27', 'M_28', 'M_29', 'M_30'
, 'M_31', 'M_32', 'M_33', 'M_34', 'M_35', 'M_36', 'M_37', 'M_38', 'M_39'
, 'M_40', 'M_41', 'M_42', 'M_43', 'M_44', 'M_45', 'M_46', 'M_47', 'M_48'
, 'M_49', 'M_50', 'M_51', 'M_52', 'M_53', 'M_54', 'M_55', 'M_56', 'M_57'
, 'M_58', 'M_59', 'M_60', 'M_61', 'M_62', 'M_63', 'M_64', 'M_65', 'M_66'
, 'M_67', 'M_68', 'M_69', 'M_70', 'M_71', 'M_72', 'M_73', 'M_74', 'M_75'
, 'M_76', 'M_77', 'M_78', 'M_79', 'M_80', 'M_81', 'M_82', 'M_999', 'S_18'
, 'S_19', 'S_20', 'S_21', 'S_22', 'S_23', 'S_24', 'S_25', 'S_26', 'S_27'
, 'S_28', 'S_29', 'S_30', 'S_31', 'S_32', 'S_33', 'S_34', 'S_35', 'S_36'
, 'S_37', 'S_38', 'S_39', 'S_40', 'S_41', 'S_42', 'S_43', 'S_44', 'S_45'
, 'S_46', 'S_47', 'S_48', 'S_49', 'S_50', 'S_51', 'S_52', 'S_53', 'S_54'
, 'S_55', 'S_56', 'S_57', 'S_58', 'S_59', 'S_60', 'S_61', 'S_62', 'S_63'
, 'S_64', 'S_65', 'S_66', 'S_67', 'S_68', 'S_69', 'S_70', 'S_71', 'S_72'
, 'S_73', 'S_74', 'S_75', 'S_77', 'S_79', 'S_999'], ordered = True)
coll_train['marriage_statusXage_codes'] = coll_train['marital_statusXage2'].astype(marriage_cat_type)
我收到此警告。
C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ ipykernel_launcher.py:2: SettingWithCopyWarning:试图在一个副本上设置一个值 从DataFrame切片。尝试使用.loc [row_indexer,col_indexer] = 值代替
请参阅文档中的警告: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
我尝试过这种操作会导致失败:
coll_train ['marriage_statusXage_codes'] = coll_train.loc [:, 'marital_statusXage2']。astype(marriage_cat_type)
有人能指出我正确的方向吗?
答案 0 :(得分:1)
这是一个链接分配问题。可以由pd.set_option('chained_assignment',None|'warn'|'raise')
操纵。
警告已打开,熊猫不喜欢coll_train
。
有两个选项:确保coll_train
是要修改的源数据框(通过在其中放置一个名为marriage_statusXage_codes
的新列来完成此操作)。如果是,并且Pandas错误,则设置pd.set_option('chained_assignment',None)
。熊猫对此有错吗?我不知道。
这是在切片上设置值的说明。
import pandas as pd
from pandas.compat import StringIO
print(pd.__version__)
csvdata = StringIO("""date,LASTA,LASTB,LASTC
1999-03-15,2.5597,8.20145,16.900
1999-03-31,2.7724,7.73057,16.955
1999-04-01,2.8321,7.63714,17.500
1999-04-06,2.8537,7.63703,17.750""")
df = pd.read_csv(csvdata, sep=",", index_col="date", parse_dates=True, infer_datetime_format=True)
pd.set_option('chained_assignment','warn')
a_slice = df['1999-03-31':'1999-04-01']
print(id(df), id(a_slice))
# generates the warning
a_slice['LASTA'] = 10
# original does not have the data set on a slice!
print(df[df['LASTA'] == 10]['LASTA'].any())
# create a new object to which values can be set, no warning.
a_slice = a_slice.copy()
a_slice['LASTA'] = 10
print(a_slice[a_slice['LASTA'] == 10]['LASTA'].any())
结果
0.20.3
(4549520208, 4594637776)
slicecopy.py:20: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
a_slice['LASTA'] = 10
False
True