Question

我正在将pandas对象列复制到一个单独的有序列中，但是我收到警告，并且终生都没有想出如何正确地进行操作。

我无法发布整个数据框架，但这是我使用的语法：

marriage_cat_type = CategoricalDtype(categories= ['M_22', 'M_23', 'M_24', 'M_25', 'M_26', 'M_27', 'M_28', 'M_29', 'M_30'
                                                  , 'M_31', 'M_32', 'M_33', 'M_34', 'M_35', 'M_36', 'M_37', 'M_38', 'M_39'
                                                  , 'M_40', 'M_41', 'M_42', 'M_43', 'M_44', 'M_45', 'M_46', 'M_47', 'M_48'
                                                  , 'M_49', 'M_50', 'M_51', 'M_52', 'M_53', 'M_54', 'M_55', 'M_56', 'M_57'
                                                  , 'M_58', 'M_59', 'M_60', 'M_61', 'M_62', 'M_63', 'M_64', 'M_65', 'M_66'
                                                  , 'M_67', 'M_68', 'M_69', 'M_70', 'M_71', 'M_72', 'M_73', 'M_74', 'M_75'
                                                  , 'M_76', 'M_77', 'M_78', 'M_79', 'M_80', 'M_81', 'M_82', 'M_999', 'S_18'
                                                  , 'S_19', 'S_20', 'S_21', 'S_22', 'S_23', 'S_24', 'S_25', 'S_26', 'S_27'
                                                  , 'S_28', 'S_29', 'S_30', 'S_31', 'S_32', 'S_33', 'S_34', 'S_35', 'S_36'
                                                  , 'S_37', 'S_38', 'S_39', 'S_40', 'S_41', 'S_42', 'S_43', 'S_44', 'S_45'
                                                  , 'S_46', 'S_47', 'S_48', 'S_49', 'S_50', 'S_51', 'S_52', 'S_53', 'S_54'
                                                  , 'S_55', 'S_56', 'S_57', 'S_58', 'S_59', 'S_60', 'S_61', 'S_62', 'S_63'
                                                  , 'S_64', 'S_65', 'S_66', 'S_67', 'S_68', 'S_69', 'S_70', 'S_71', 'S_72'
                                                  , 'S_73', 'S_74', 'S_75', 'S_77', 'S_79', 'S_999'], ordered = True)

coll_train['marriage_statusXage_codes'] = coll_train['marital_statusXage2'].astype(marriage_cat_type)

我收到此警告。

C：\ ProgramData \ Anaconda3 \ lib \ site-packages \ ipykernel_launcher.py：2：   SettingWithCopyWarning：试图在一个副本上设置一个值   从DataFrame切片。尝试使用.loc [row_indexer，col_indexer] =   值代替

请参阅文档中的警告：   http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

我尝试过这种操作会导致失败：

coll_train ['marriage_statusXage_codes'] = coll_train.loc [:, 'marital_statusXage2']。astype（marriage_cat_type）

有人能指出我正确的方向吗？

Answer 1

这是一个链接分配问题。可以由pd.set_option('chained_assignment',None|'warn'|'raise')操纵。

警告已打开，熊猫不喜欢coll_train。

有两个选项：确保coll_train是要修改的源数据框（通过在其中放置一个名为marriage_statusXage_codes的新列来完成此操作）。如果是，并且Pandas错误，则设置pd.set_option('chained_assignment',None)。熊猫对此有错吗？我不知道。

这是在切片上设置值的说明。

import pandas as pd
from pandas.compat import StringIO

print(pd.__version__)

csvdata = StringIO("""date,LASTA,LASTB,LASTC
1999-03-15,2.5597,8.20145,16.900
1999-03-31,2.7724,7.73057,16.955
1999-04-01,2.8321,7.63714,17.500
1999-04-06,2.8537,7.63703,17.750""")

df = pd.read_csv(csvdata, sep=",", index_col="date", parse_dates=True, infer_datetime_format=True)

pd.set_option('chained_assignment','warn')

a_slice = df['1999-03-31':'1999-04-01']
print(id(df), id(a_slice))
# generates the warning
a_slice['LASTA'] = 10
# original does not have the data set on a slice!
print(df[df['LASTA'] == 10]['LASTA'].any())

# create a new object to which values can be set, no warning.
a_slice = a_slice.copy()
a_slice['LASTA'] = 10
print(a_slice[a_slice['LASTA'] == 10]['LASTA'].any())

结果

0.20.3
(4549520208, 4594637776)
slicecopy.py:20: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  a_slice['LASTA'] = 10
False
True

复制对象熊猫列作为类别类型，给出警告

1 个答案: