我有一个这样的数据框:
Cause_of_death famous_for name nationality
suicide by hanging African jazz XYZ South
unknown Korean president ABC South
heart attack businessman EFG American
heart failure Prime Minister LMN Indian
heart problems African writer PQR South
数据框太大了。我想要做的是在国籍列中进行更改。您可以看到,对于国籍 = 南,我们将韩国和非洲作为字符串的一部分 famous_for 专栏。所以,如果 famous_for 包含非洲和国籍,我想要做的是将国籍更改为南非 韩国如果 famous_for 包含韩国。
我曾尝试的是:
for i in deaths['nationality']:
if (deaths['nationality']=='South'):
if deaths['famous_for'].contains('Korea'):
deaths['nationality']='South Korea'
elif deaths['famous_for'].contains('Korea'):
deaths['nationality']='South Africa'
else:
pass
答案 0 :(得分:2)
您可以使用contains()检查famous_for列是否包含韩国或非洲,并相应地设置国籍。
df.loc[df.famous_for.str.contains('Korean'), 'nationality']='South Korean'
df.loc[df.famous_for.str.contains('Africa'), 'nationality']='South Africa'
df
Out[783]:
Cause_of_death famous_for name nationality
0 suicide by hanging African jazz XYZ South Africa
1 unknown Korean president ABC South Korean
2 heart attack businessman EFG American
3 heart failure Prime Minister LMN Indian
4 heart problems African writer PQR South Africa
或者您可以使用以下方法在一行中执行此操作:
df.nationality = (
df.nationality.str.cat(df.famous_for.str.extract('(Africa|Korea)',expand=False),
sep=' ', na_rep=''))
df
Out[801]:
Cause_of_death famous_for name nationality
0 suicide by hanging African jazz XYZ South Africa
1 unknown Korean president ABC South Korea
2 heart attack businessman EFG American
3 heart failure Prime Minister LMN Indian
4 heart problems African writer PQR South Africa
答案 1 :(得分:1)
如果可能的条件很多,请使用自定义函数DataFrame.apply
和axis=1
按行处理:
def f(x):
if (x['nationality']=='South'):
if 'Korea' in x['famous_for']:
return 'South Korea'
elif 'Africa' in x['famous_for']:
return 'South Africa'
else:
return x['nationality']
deaths['nationality'] = deaths.apply(f, axis=1)
print (deaths)
Cause_of_death famous_for name nationality
0 suicide by hanging African jazz XYZ South Africa
1 unknown Korean president ABC South Korea
2 heart attack businessman EFG American
3 heart failure Prime Minister LMN Indian
4 heart problems African writer PQR South Africa
但如果只有少数情况会将str.contains
与DataFrame.loc
一起使用:
mask1 = deaths['nationality'] == 'South'
mask2 = deaths['famous_for'].str.contains('Korean')
mask3 = deaths['famous_for'].str.contains('Africa')
deaths.loc[mask1 & mask2, 'nationality']='South Korea'
deaths.loc[mask1 & mask3, 'nationality']='South Africa'
print (deaths)
0 suicide by hanging African jazz XYZ South Africa
1 unknown Korean president ABC South Korea
2 heart attack businessman EFG American
3 heart failure Prime Minister LMN Indian
4 heart problems African writer PQR South Africa
mask
的另一个解决方案:
mask1 = deaths['nationality'] == 'South'
mask2 = deaths['famous_for'].str.contains('Korean')
mask3 = deaths['famous_for'].str.contains('Africa')
deaths['nationality'] = deaths['nationality'].mask(mask1 & mask2, 'South Korea')
deaths['nationality'] = deaths['nationality'].mask(mask1 & mask3,'South Africa')
print (deaths)
0 suicide by hanging African jazz XYZ South Africa
1 unknown Korean president ABC South Korea
2 heart attack businessman EFG American
3 heart failure Prime Minister LMN Indian
4 heart problems African writer PQR South Africa