更改条件的数据框列

时间:2018-07-11 16:40:44

标签: python python-3.x pandas

样本数据框

CountryName

India|Pakistan
Pakistan|Agansitan
Sweden
Nepal|Bhutan

输出带有新列的数据框

CountryName           MainCountry

India|Pakistan        India
Pakistan|Agansitan    Pakistan
Sweden                Sweden
Nepal|Bhutan          Nepal

我尝试过

df["MainCountry"] =df['CountryName'].str.contains("[|].*","")

给出正确或错误的信息,您能帮助我了解如何获得该信息

4 个答案:

答案 0 :(得分:3)

您可以

In [87]: df['MainCountry'] = df['CountryName'].str.split('|').str[0]

In [88]: df
Out[88]:
          CountryName MainCountry
0      India|Pakistan       India
1  Pakistan|Agansitan    Pakistan
2              Sweden      Sweden
3        Nepal|Bhutan       Nepal

答案 1 :(得分:3)

使用 str.extract

df.assign(MainCountry=df.CountryName.str.extract(r'(.*?)(?:\||$)'))

          CountryName MainCountry
0      India|Pakistan       India
1  Pakistan|Agansitan    Pakistan
2              Sweden      Sweden
3        Nepal|Bhutan       Nepal 

str.partition

df.assign(MainCountry=df.CountryName.str.partition('|')[0])

          CountryName MainCountry
0      India|Pakistan       India
1  Pakistan|Agansitan    Pakistan
2              Sweden      Sweden
3        Nepal|Bhutan       Nepal

答案 2 :(得分:2)

使用str.splitstr.get

df.CountryName.str.split('|').str.get(0)

答案 3 :(得分:0)

使用Where

df['Main_Country'] = (np.where(df['CountryName'].str.contains('|'),
                  df['CountryName'].str.split('|').str[0],
                  df['CountryName']))

输出:

    CountryName       Main_Country
0   India|Pakistan      India
1   Pakistan|Agansitan  Pakistan
2   Sweden              Sweden
3   Nepal|Bhutan        Nepal