Np.where给了我很多错误,所以我正在寻找一个df.loc的解决方案。
这是我得到的np.where错误:
C:\Users\xxx\AppData\Local\Continuum\Anaconda2\lib\site-packages\ipykernel\__main__.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
if __name__ == '__main__':
我正在使用以下数据框df:
df = pd.DataFrame({'Column_A': ['AAA','AAA','ABC','CDE'],'checked': ['0','0','1','0'],'duplicate': ['True','True','False','False']})
Column_A checked duplicate
0 AAA 0 True
1 AAA 0 True
2 ABC 1 False
3 CDE 0 False
如果选中0为0且复制为True,我想创建一个附加标志。
我尝试过这个并没有用:
df['flag'] = (np.where((df['checked'] == 'Y') &(df['duplicate'] == 'True'), 'Y', '0'))
TypeError: invalid type comparison
我用df.loc尝试了它:
df['flag'] = (df.loc[df['checked'] == 'Y']& df.loc[df['duplicate'] == 'True'], 'Y','0')
TypeError: invalid type comparison
我得到同样的错误!
答案 0 :(得分:7)
我认为您的boolean
不是string
,因此需要删除'
:
df = pd.DataFrame({'Column_A': ['AAA','AAA','ABC','CDE'],
'checked': ['0','0','1','0'],
'duplicate': [True, True, False, False]})
df['flag'] = np.where((df['checked'] == 'Y') &(df['duplicate'] == True), 'Y', '0')
print (df)
Column_A checked duplicate flag
0 AAA 0 True 0
1 AAA 0 True 0
2 ABC 1 False 0
3 CDE 0 False 0
如果与boolean
列进行比较,== True
可以省略:
df['flag'] = np.where((df['checked'] == 'Y') &(df['duplicate']), 'Y', '0')
print (df)
Column_A checked duplicate flag
0 AAA 0 True 0
1 AAA 0 True 0
2 ABC 1 False 0
3 CDE 0 False 0
如果需要检查checked
需要'
,因为strings
:
df['flag'] = np.where((df['checked'] == '0') &(df['duplicate'] == True), 'Y', '0')
print (df)
Column_A checked duplicate flag
0 AAA 0 True Y
1 AAA 0 True Y
2 ABC 1 False 0
3 CDE 0 False 0
编辑:
loc
的解决方案:
df['flag'] = '0'
mask = (df['checked'] == '0') &(df['duplicate'])
df.loc[mask, 'flag'] = 'Y'
print (df)
Column_A checked duplicate flag
0 AAA 0 True Y
1 AAA 0 True Y
2 ABC 1 False 0
3 CDE 0 False 0