我的数据框如下所示:
lname fname rno_cd eri_cd
0 CRUISE TOM E 1
1 DEPP JOHNNY Y 0
2 DICAPR LENARDO 1
3 PITT BRAD 1
4 MOST JEFF A 0
5 HANKS TOM 1
6 BRANDO MARLON C 1
7 WILLIAMS ROBIN F 1
8 DOWNEY ROBERT B 1
9 PACINO AL E 1
列['rno_cd']中的代码定义为:
A = AI/AK Native
B = Asian
C = Black/AA
D = Hispanic
E = White
F = Asian
G = Asian
H = Haw/Pac Isl.
Y = White
1)我需要定义这些代码并放在一个新列中 2)我还需要以某种方式解释空白值
最终结果如下:
lname fname rno_cd eri_cd rno_defined
0 CRUISE TOM E 1 White
1 DEPP JOHNNY Y 0 White
2 DICAPR LENARDO 1 Unknown
3 PITT BRAD 1 Unknown
4 MOST JEFF A 0 AI/AK Native
5 HANKS TOM 1 Unknown
6 BRANDO MARLON C 1 Black/AA
7 WILLIAMS ROBIN F 1 Asian
8 DOWNEY ROBERT B 1 Asian
9 PACINO AL E 1 White
======================我的编码很快==================
我使用了以下内容,但不确定它是否是一个可靠的解决方案。
In[1]:
df1['rno_cd'][df1.rno_cd.str.contains('A')] = 'AI/AK Native'
df1['rno_cd'][df1.rno_cd.str.contains('B')] = 'Asian'
df1['rno_cd'][df1.rno_cd.str.contains('C')] = 'Black/AA'
df1['rno_cd'][df1.rno_cd.str.contains('D')] = 'Hispanic'
df1['rno_cd'][df1.rno_cd.str.contains('E')] = 'White'
df1['rno_cd'][df1.rno_cd.str.contains('F')] = 'Asian'
df1['rno_cd'][df1.rno_cd.str.contains('G')] = 'Asian'
df1['rno_cd'][df1.rno_cd.str.contains('H')] = 'HawPac'
df1['rno_cd'][df1.rno_cd.str.contains('Y')] = 'White'
In[1]: df1
Out[1]:
lname fname rno_cd eri_cd
0 SONJU LAURIE White 1
1 FORTHOFER KELLY White 0
2 PLILEY JODY 1
3 NOEL HEATHER 1
4 MANNING CYNTHIA White 0
5 NAUERTZ ELIZABETH 1
6 SCHMID DAVID White 1
7 HINTHER VICTORIA White 1
8 JOHNSON B. White 1
9 MOORE CAROL White 1
10 MARSHALL JOY 1
此代码的限制是它不会为原始数据集中的空白值赋值。我也看不到原始代码来验证值是否正确。
有任何建议/意见/建议吗?
感谢。
答案 0 :(得分:2)
系列(例如,DataFrame的列)具有方便的map
方法。您只需要以字典形式进行编码:
code_to_ethnicity: {'A': 'AI/AK Native',
'B': 'Asian'} #etc
df['rno_defined'] = df['rno_cd'].map(code_to_ethnicity)
当您描述'空白值'时,我认为您的意思是空字符串:''
。如果你想为这些做一些特殊的事情,你可以直接将它添加到字典中。
code_to_ethnicity: {'A': 'AI/AK Native',
'B': 'Asian',
'': 'other}
答案 1 :(得分:1)
您可以构建一个字典,其中键是引用,值是名称。
D={"A":"AI/AK Native","B":"Asian","C":"Black/AA","D":"Hispanic","E":"White","F":"Asian","G":"Asian","H":"Haw/Pac Isl","Y":"White"}
然后浏览rno_cd
列,并应用转换数据的函数。您可以使用apply
和函数tranform
来验证x是否为密钥,以便使用字典D[x]
获取值,如果不是这样,则只返回{{1} }
"unknown"
另一种方法:
data="""lname fname rno_cd eri_cd
0 CRUISE TOM E 1
1 DEPP JOHNNY Y 0
2 DICAPR LENARDO Nan 1
3 PITT BRAD Nan 1
4 MOST JEFF A 0
5 HANKS TOM Nan 1
6 BRANDO MARLON C 1
7 WILLIAMS ROBIN F 1
8 DOWNEY ROBERT B 1
9 PACINO AL E 1"""
import pandas as pd
from collections import Counter
from io import StringIO
df= pd.read_csv(StringIO(data.decode('UTF-8')),delim_whitespace=True )
D={"A":"AI/AK Native","B":"Asian","C":"Black/AA","D":"Hispanic","E":"White","F":"Asian","G":"Asian","H":"Haw/Pac Isl","Y":"White"}
def transform(x):
if x['rno_cd']=="Nan":
return "Unknown"
else:
return D[x['rno_cd']]
df["rno_defined"]= df.apply(lambda x: transform(x) ,axis=1)
print df
输出:
df["rno_defined"]= map(lambda x: D[x] if x!="Nan" else "Unknown",df['rno_cd'].values)