以下是示例数据
import pandas as pd
cols = ['Country','Name','SomeNumber','SomeDate']
sourceData = [('WI','Vivian',34,'#1985-01-01#'),
('IND','Sam',56,'#1988-02-01#'),
('NZ','Richard',324,'#1987-07-01#'),
('AUS','Don',98,'#1998-07-12#'),
('SL','Simth',101,'#2001-07-12#'),]
x = pd.DataFrame(sourceData,columns=cols)
x
Country Name SomeNumber SomeDate
0 WI Vivian 34 #1985-01-01#
1 IND Sam 56 #1988-02-01#
2 NZ Richard 324 #1987-07-01#
3 AUS Don 98 #1998-07-12#
4 SL Simth 101 #2001-07-12#
我想要做的是,更新每个列,表中的每个值都缺少值,除了'名称'列
现在,更新数据框应如下所示:
Country Name SomeNumber SomeDate
0 MISSING Vivian MISSING MISSING
1 MISSING Sam MISSING MISSING
2 MISSING Richard MISSING MISSING
3 MISSING Don MISSING MISSING
4 MISSING Simth MISSING MISSING
请注意,我不想做这样的事情,因为在现实世界中,我有114列:
x['Country'] = 'MISSING'
x['SomeNumber'] = 'MISSING'
x['SomeDate'] = 'MISSING'
我试过了:
cols.remove('Name')
x[cols] = 'MISSING"
但它给了我以下警告,我想避免:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#inde
xing-view-versus-copy
inTardisMissingInSource[cols] = 'MISSING'
C:\tardis\desktop\environment\python\lib\site-packages\pandas\core\indexing.py:477: SettingWithCopyW
arning:
答案 0 :(得分:4)
SettingWithCopyWarning
很好地表明您在错误的位置使用布尔索引。您应该使用df.loc
,如下所示:
In [1430]: x.loc[:, x.columns.difference(['Name'])] = 'MISSING'
In [1431]: x
Out[1431]:
Country Name SomeNumber SomeDate
0 MISSING Vivian MISSING MISSING
1 MISSING Sam MISSING MISSING
2 MISSING Richard MISSING MISSING
3 MISSING Don MISSING MISSING
4 MISSING Simth MISSING MISSING
主要位是x.columns.difference([...])
。传入要排除的列标题列表,并且不会为分配选择这些列。
请注意,此混合分配会更改行的dtype
,请谨慎使用。
如果您不想进行就地分配,可以通过解压缩字典来使用df.assign
:
In [1435]: x.assign(**{ k : 'MISSING' for k in x.columns.difference(['Name'])})
Out[1435]:
Country Name SomeNumber SomeDate
0 MISSING Vivian MISSING MISSING
1 MISSING Sam MISSING MISSING
2 MISSING Richard MISSING MISSING
3 MISSING Don MISSING MISSING
4 MISSING Simth MISSING MISSING