我有一个类似于以下内容的熊猫数据框:
Neighborhood High School ...
WOODLEY LIBERTY
WOODLEY
COUNTRY CLUB
COUNTRY CLUB HERITAGE
COUNTRY CLUB HERITAGE
COUNTRY CLUB TUSCORORA
...
如您所见,某些条目为空白或不正确,因此我正在尝试解决这些问题。我首先创建了如下函数。
def cleanHS(dat):
if dat.Neighborhood == "WOODLEY":
dat["High School"] == "LIBERTY"
elif dat.Neighborhood == "COUNTRY CLUB":
dat["High School"] == "HERITAGE"
...
return dat
然后我调用该函数。
dirty["High School"] = dirty["High School"].map(cleanHS)
这是我收到属性错误的地方:
AttributeError: 'str' object has no attribute 'Neighborhood'
我该如何解决?
答案 0 :(得分:0)
这里不需要循环。您可以创建从Neighbourhood
和mapping到High School
d = {"WOODLEY": "LIBERTY", "COUNTRY CLUB": "HERITAGE"}
dirty['High School'] = dirty['Neighborhood'].map(d)
输出
Neighborhood High School
WOODLEY LIBERTY
WOODLEY LIBERTY
COUNTRY CLUB HERITAGE
COUNTRY CLUB HERITAGE
COUNTRY CLUB HERITAGE
COUNTRY CLUB HERITAGE
答案 1 :(得分:-1)
这是正确的答案。使用字典进行映射很容易(如另一个答案所示)。
cleanHS = {"WOODLEY": "LIBERTY", "COUNTRY CLUB": "HERITAGE", ...}
但是,为了正确地映射两列,必须包括邻居列。这是因为您正在将“高中”中的值映射到其他值,但是映射值的起始列应该是“邻居”。
dirty["High School"] = dirty["Neighborhood"].map(cleanHS)