根据条件获取要用另一个列值替换的列值

时间:2019-01-10 03:58:27

标签: pandas

根据条件将一列值替换为另一列值时出错。

这是代码...

import pandas as pd
import numpy as np

df = pd.DataFrame({ 'A' : 1.,
'B' : pd.Timestamp('20130102'),
'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
'D' : [1, 2, 1, 3],
'E' : pd.Categorical(["test","train","test","train"]),
'F' : 'foo' })

如果D列的值== 1,则希望用F替换E列的值。

尝试了以下替代方法...

替代A:

df[df.D == 1]['E'] = df[df.D == 1]['F']

这给出了SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

替代项B:

mask = df['D'] == 1
df.loc[mask, 'E'] = df.loc[mask, 'F']

...给出ValueError: Cannot setitem on a Categorical with a new category, set the categories first

替代C:

df.loc[mask, 'E'].replace(df.loc[mask, 'F'])
df

...根本什么也没做。

我要去哪里错了?正确的方法是什么?

1 个答案:

答案 0 :(得分:1)

设置分类类别可以使它起作用:

In [7]: df = pd.DataFrame({ 'A' : 1.,
   ...: 'B' : pd.Timestamp('20130102'),
   ...: 'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
   ...: 'D' : [1, 2, 1, 3],
   ...: 'E' : pd.Categorical(["test","train","test","train"], categories=['test', 'train', 'foo']),
   ...: 'F' : 'foo' })

In [8]: df
Out[8]: 
     A          B    C  D      E    F
0  1.0 2013-01-02  1.0  1   test  foo
1  1.0 2013-01-02  1.0  2  train  foo
2  1.0 2013-01-02  1.0  1   test  foo
3  1.0 2013-01-02  1.0  3  train  foo

In [9]: df.loc[df.D == 1, 'E'] = df.F

In [10]: df
Out[10]: 
     A          B    C  D      E    F
0  1.0 2013-01-02  1.0  1    foo  foo
1  1.0 2013-01-02  1.0  2  train  foo
2  1.0 2013-01-02  1.0  1    foo  foo
3  1.0 2013-01-02  1.0  3  train  foo