我在熊猫中有以下数据框
job_desig salary
senior analyst 12
junior researcher 5
scientist 20
sr analyst 12
现在我想生成一列,其标志设置如下
sr = ['senior','sr']
job_desig salary senior_profile
senior analyst 12 1
junior researcher 5 0
scientist 20 0
sr analyst 12 1
我正在熊猫后面追随
df['senior_profile'] = [1 if x.str.contains(sr) else 0 for x in
df['job_desig']]
答案 0 :(得分:5)
对于正则表达式|
,您可以通过OR
连接list的所有值,传递给Series.str.contains
,最后转换为True/False
到1/0
的整数:
df['senior_profile'] = df['job_desig'].str.contains('|'.join(sr)).astype(int)
如有必要,请使用单词边界:
pat = '|'.join(r"\b{}\b".format(x) for x in sr)
df['senior_profile'] = df['job_desig'].str.contains(pat).astype(int)
print (df)
job_desig salary senior_profile
0 senior analyst 12 1
1 junior researcher 5 0
2 scientist 20 0
3 sr analyst 12 1
如果列表中只有一个单词值,则使用集的解决方案:
df['senior_profile'] = [int(bool(set(sr).intersection(x.split()))) for x in df['job_desig']]
答案 1 :(得分:4)
您只需使用str.contains
df['senior_profile'] = df['job_desig'].str.contains('senior') | df['job_desig'].str.contains('sr')