假设我有一个数据框all_data
,例如:
Id Zone Neighb
1 NaN IDOTRR
2 RL Veenker
3 NaN IDOTRR
4 RM Crawfor
5 NaN Mitchel
我想在' Zone'中输入缺失的值。专栏,这样的地方就是Neighb'是' IDOTRR'我设置了Zone' Zone'成为' RM',而在哪里' Neighb'是' Mitchel'我设置了' RL'。
all_data.loc[all_data.MSZoning.isnull()
& all_data.Neighborhood == "IDOTRR", "MSZoning"] = "RM"
all_data.loc[all_data.MSZoning.isnull()
& all_data.Neighborhood == "Mitchel", "MSZoning"] = "RL"
我明白了:
TypeError:无效的类型比较
C:\用户\ pprun \ Anaconda3 \ lib中\站点包\大熊猫\核心\ ops.py:798: FutureWarning:元素比较失败;返回标量 相反,但将来会进行元素比较 result = getattr(x,name)(y)
我确信这应该很简单,但我已经把它弄乱了太久了。请帮忙。
答案 0 :(得分:3)
使用np.select即
df['Zone'] = np.select([df['Neighb'] == 'IDOTRR',df['Neighb'] == 'Mitchel'],['RM','RL'],df['Zone'])
Id Zone Neighb 0 1 RM IDOTRR 1 2 RL Veenker 2 3 RM IDOTRR 3 4 RM Crawfor 4 5 RL Mitchel
在您遇到条件的情况下,您可以使用
# Boolean mask of condition 1
m1 = (all_data.MSZoning.isnull()) & (all_data.Neighborhood == "IDOTRR")
# Boolean mask of condition 2
m2 = (all_data.MSZoning.isnull()) & (all_data.Neighborhood == "Mitchel")
np.select([m1,m2],['RM','RL'],all_data["MSZoning"])
答案 1 :(得分:2)
df.Zone=df.Zone.fillna(df.Neighb.replace({'IDOTRR':'RM','Mitchel':'RL'}))
df
Out[784]:
Id Zone Neighb
0 1 RM IDOTRR
1 2 RL Veenker
2 3 RM IDOTRR
3 4 RM Crawfor
4 5 RL Mitchel
答案 2 :(得分:1)
在Python中,&
优先于==
http://www.annedawson.net/Python_Precedence.htm
所以,当你执行all_data.MSZoning.isnull() & all_data.Neighborhood == "Mitchel"
时,它被解释为(all_data.MSZoning.isnull() & all_data.Neighborhood) == "Mitchel"
,现在Python尝试AND
一个带有str系列的布尔系列,看看是否&#39 ; s等于单个str "Mitchel"
。解决方案是将测试括在括号中:(all_data.MSZoning.isnull()) & (all_data.Neighborhood == "Mitchel")
。有时如果我有很多选择器,我会将它们分配给变量,然后AND
它们,例如:
null_zoning = all_data.MSZoning.isnull()
Mitchel_neighb = all_data.Neighborhood == "Mitchel"
all_data.loc[null_zoning & Mitchel_neighb, "MSZoning"] = "RL"
这不仅会解决操作顺序问题,还意味着all_data.loc[null_zoning & Mitchel_neighb, "MSZoning"] = "RL"
适合一行。