我有这段代码会触发此警告,但我不知道为什么会发生或如何解决:
self.data
是熊猫数据框self.tag_freq
是使用value_counts()
的列中的self.data
def edit_tag(lst, old_set, new):
return [elem if elem not in old_set else new for elem in lst]
toretag = self.tag_freq['count'] < lim
count = self.tag_freq['count'][toretag].sum()
classlist = list(self.tag_freq[toretag].index)
self.data['newtags'] = self.data['tags'].apply(lambda x: edit_tag(lst=x, old_set=classlist, new='underrep'))
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.data['newtags'] = self.data['tags'].apply(lambda x: edit_tag(lst=x, old_set=classlist, new='underrep'))
我试图将上面代码的最后一行更改为使用.loc
,但它仍会触发警告。
self.data.loc[:, 'newtags'] = self.data['tags'].apply(lambda x: edit_tag(lst=x, old_set=classlist, new='underrep'))
我尝试使用以下代码复制警告,但没有成功。
import pandas as pd
df = pd.DataFrame(data=None)
df['col0'] = list(range(100))
df['col0'] = df['col0'].apply(lambda x: x*2)
编辑: 帖子(Getting SettingWithCopyWarning warning even after using .loc in pandas [duplicate])说,问题是因为我正在操作数据框的副本,因此要修复它,我保证我正在使用的数据框是一个副本,但对象本身:
data = self.data.copy()
data['newtags'] = data['tags'].apply(lambda x: edit_tag(lst=x, old_set=classlist, new='underrep'))
然后我使用self.data
对象更新data
。