在迭代列时如何更改数据框中列的值?

时间:2017-06-24 07:35:25

标签: python pandas dataframe

我有一个这样的数据框:

Cause_of_death       famous_for          name         nationality
suicide by hanging   African jazz        XYZ             South
unknown              Korean president    ABC             South
heart attack         businessman         EFG             American
heart failure        Prime Minister      LMN             Indian
heart problems       African writer      PQR             South

数据框太大了。我想要做的是在国籍列中进行更改。您可以看到,对于国籍 = ,我们将韩国非洲作为字符串的一部分 famous_for 专栏。所以,如果 famous_for 包含非洲国籍,我想要做的是将国籍更改为南非 韩国如果 famous_for 包含韩国

我曾尝试的是:

for i in deaths['nationality']:
if (deaths['nationality']=='South'):
    if deaths['famous_for'].contains('Korea'):
        deaths['nationality']='South Korea'
    elif deaths['famous_for'].contains('Korea'):
        deaths['nationality']='South Africa'
    else:
        pass

2 个答案:

答案 0 :(得分:2)

您可以使用contains()检查famous_for列是否包含韩国或非洲,并相应地设置国籍。

df.loc[df.famous_for.str.contains('Korean'), 'nationality']='South Korean'

df.loc[df.famous_for.str.contains('Africa'), 'nationality']='South Africa'

df
Out[783]: 
       Cause_of_death        famous_for  name   nationality
0  suicide by hanging      African jazz   XYZ  South Africa
1             unknown  Korean president   ABC  South Korean
2        heart attack       businessman   EFG      American
3       heart failure    Prime Minister   LMN        Indian
4      heart problems    African writer   PQR  South Africa

或者您可以使用以下方法在一行中执行此操作:

df.nationality = (
    df.nationality.str.cat(df.famous_for.str.extract('(Africa|Korea)',expand=False),
                           sep=' ', na_rep=''))

df
Out[801]: 
       Cause_of_death        famous_for  name    nationality
0  suicide by hanging      African jazz   XYZ   South Africa
1             unknown  Korean president   ABC    South Korea
2        heart attack       businessman   EFG      American 
3       heart failure    Prime Minister   LMN        Indian 
4      heart problems    African writer   PQR   South Africa

答案 1 :(得分:1)

如果可能的条件很多,请使用自定义函数DataFrame.applyaxis=1按行处理:

def f(x):
    if (x['nationality']=='South'):
        if 'Korea' in x['famous_for']:
            return 'South Korea'
        elif 'Africa' in x['famous_for']:
            return 'South Africa'
    else:
        return x['nationality']


deaths['nationality'] = deaths.apply(f, axis=1)
print (deaths)
       Cause_of_death        famous_for name   nationality
0  suicide by hanging      African jazz  XYZ  South Africa
1             unknown  Korean president  ABC   South Korea
2        heart attack       businessman  EFG      American
3       heart failure    Prime Minister  LMN        Indian
4      heart problems    African writer  PQR  South Africa

但如果只有少数情况会将str.containsDataFrame.loc一起使用:

mask1 = deaths['nationality'] == 'South'
mask2 = deaths['famous_for'].str.contains('Korean')
mask3 = deaths['famous_for'].str.contains('Africa')

deaths.loc[mask1 & mask2, 'nationality']='South Korea'
deaths.loc[mask1 & mask3, 'nationality']='South Africa'
print (deaths)
0  suicide by hanging      African jazz  XYZ  South Africa
1             unknown  Korean president  ABC   South Korea
2        heart attack       businessman  EFG      American
3       heart failure    Prime Minister  LMN        Indian
4      heart problems    African writer  PQR  South Africa

mask的另一个解决方案:

mask1 = deaths['nationality'] == 'South'
mask2 = deaths['famous_for'].str.contains('Korean')
mask3 = deaths['famous_for'].str.contains('Africa')

deaths['nationality'] = deaths['nationality'].mask(mask1 & mask2, 'South Korea')
deaths['nationality'] = deaths['nationality'].mask(mask1 & mask3,'South Africa')
print (deaths)
0  suicide by hanging      African jazz  XYZ  South Africa
1             unknown  Korean president  ABC   South Korea
2        heart attack       businessman  EFG      American
3       heart failure    Prime Minister  LMN        Indian
4      heart problems    African writer  PQR  South Africa