根据条件将一列值替换为另一列值时出错。
这是代码...
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'A' : 1.,
'B' : pd.Timestamp('20130102'),
'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
'D' : [1, 2, 1, 3],
'E' : pd.Categorical(["test","train","test","train"]),
'F' : 'foo' })
如果D列的值== 1,则希望用F替换E列的值。
尝试了以下替代方法...
替代A:
df[df.D == 1]['E'] = df[df.D == 1]['F']
这给出了SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
替代项B:
mask = df['D'] == 1
df.loc[mask, 'E'] = df.loc[mask, 'F']
...给出ValueError: Cannot setitem on a Categorical with a new category, set the categories first
替代C:
df.loc[mask, 'E'].replace(df.loc[mask, 'F'])
df
...根本什么也没做。
我要去哪里错了?正确的方法是什么?
答案 0 :(得分:1)
设置分类类别可以使它起作用:
In [7]: df = pd.DataFrame({ 'A' : 1.,
...: 'B' : pd.Timestamp('20130102'),
...: 'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
...: 'D' : [1, 2, 1, 3],
...: 'E' : pd.Categorical(["test","train","test","train"], categories=['test', 'train', 'foo']),
...: 'F' : 'foo' })
In [8]: df
Out[8]:
A B C D E F
0 1.0 2013-01-02 1.0 1 test foo
1 1.0 2013-01-02 1.0 2 train foo
2 1.0 2013-01-02 1.0 1 test foo
3 1.0 2013-01-02 1.0 3 train foo
In [9]: df.loc[df.D == 1, 'E'] = df.F
In [10]: df
Out[10]:
A B C D E F
0 1.0 2013-01-02 1.0 1 foo foo
1 1.0 2013-01-02 1.0 2 train foo
2 1.0 2013-01-02 1.0 1 foo foo
3 1.0 2013-01-02 1.0 3 train foo