使用单个值重新分配多个列

时间:2017-08-25 14:25:25

标签: python pandas dataframe

以下是示例数据

import pandas as pd
cols = ['Country','Name','SomeNumber','SomeDate']
sourceData  = [('WI','Vivian',34,'#1985-01-01#'),
               ('IND','Sam',56,'#1988-02-01#'),
               ('NZ','Richard',324,'#1987-07-01#'),
               ('AUS','Don',98,'#1998-07-12#'),
               ('SL','Simth',101,'#2001-07-12#'),]
x = pd.DataFrame(sourceData,columns=cols)
x
  Country     Name  SomeNumber      SomeDate
0      WI   Vivian          34  #1985-01-01#
1     IND      Sam          56  #1988-02-01#
2      NZ  Richard         324  #1987-07-01#
3     AUS      Don          98  #1998-07-12#
4      SL    Simth         101  #2001-07-12#

我想要做的是,更新每个列,表中的每个值都缺少值,除了'名称'列

现在,更新数据框应如下所示:

   Country     Name SomeNumber SomeDate
0  MISSING   Vivian    MISSING  MISSING
1  MISSING      Sam    MISSING  MISSING
2  MISSING  Richard    MISSING  MISSING
3  MISSING      Don    MISSING  MISSING
4  MISSING    Simth    MISSING  MISSING

请注意,我不想做这样的事情,因为在现实世界中,我有114列:

x['Country'] = 'MISSING'
x['SomeNumber'] = 'MISSING'
x['SomeDate'] = 'MISSING'

我试过了:

cols.remove('Name')
x[cols] = 'MISSING"

但它给了我以下警告,我想避免:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#inde
xing-view-versus-copy
  inTardisMissingInSource[cols] = 'MISSING'
C:\tardis\desktop\environment\python\lib\site-packages\pandas\core\indexing.py:477: SettingWithCopyW
arning:

1 个答案:

答案 0 :(得分:4)

SettingWithCopyWarning很好地表明您在错误的位置使用布尔索引。您应该使用df.loc,如下所示:

In [1430]: x.loc[:, x.columns.difference(['Name'])] = 'MISSING'

In [1431]: x
Out[1431]: 
   Country     Name SomeNumber SomeDate
0  MISSING   Vivian    MISSING  MISSING
1  MISSING      Sam    MISSING  MISSING
2  MISSING  Richard    MISSING  MISSING
3  MISSING      Don    MISSING  MISSING
4  MISSING    Simth    MISSING  MISSING

主要位是x.columns.difference([...])。传入要排除的列标题列表,并且不会为分配选择这些列。

请注意,此混合分配会更改行的dtype,请谨慎使用。

如果您不想进行就地分配,可以通过解压缩字典来使用df.assign

In [1435]: x.assign(**{ k : 'MISSING' for k in x.columns.difference(['Name'])})
Out[1435]: 
   Country     Name SomeNumber SomeDate
0  MISSING   Vivian    MISSING  MISSING
1  MISSING      Sam    MISSING  MISSING
2  MISSING  Richard    MISSING  MISSING
3  MISSING      Don    MISSING  MISSING
4  MISSING    Simth    MISSING  MISSING