我正在尝试使用pandas .map编辑数据集,如下面的代码所示:
df['Region'] = df['Region'].astype('category')
reg = df['Region']
cats = reg.cat.categories
ncats = len(cats)
n = len(os)
north = (...)
south = (...)
center = (...)
islands = (...)
d1 = {cats[i]:'South' for i in range(ncats) if cats[i] in south}
d2 = {cats[i]:'North' for i in range(ncats) if cats[i] in north}
d3 = {cats[i]:'Center' for i in range(ncats) if cats[i] in center}
d4 = {cats[i]:'Islands' for i in range(ncats) if cats[i] in islands}
df['Reg_cat'] = df['Region'].map(d1)
df['Reg_cat'] = df['Region'].map(d2)
df['Reg_cat'] = df['Region'].map(d3)
df['Reg_cat'] = df['Region'].map(d4)
df['Reg_cat'] = df['Reg_cat'].astype('category')
df['Reg_cat'].cat.categories
df['Reg_cat']
代码确实有效,但它只应用最后一个.map请求。所以在这种情况下它适用于d4。如果d1是最后一个,则应用那个。我做错了什么?
答案 0 :(得分:3)
每次连续的map
调用都会用NaN替换映射器内部的所有内容。
尝试构建一个字典并传递它。
m = {'North' : north, 'South' : south, 'Center' : center, 'Islands', islands}
d = {v2 : k for k, v in m.items() for v2 in v}
df['Reg_cat'] = df['Reg_cat'].map(d)
注意:
reg
cats
ncats
n
, 答案 1 :(得分:0)
每当您致电df['Reg_cat'] = df['Region'].map(d#)
时,您都会覆盖df['Reg_cat']
的值。如果您想保留所有值,请考虑将它们添加为单独的列。