如何用大陆替换数据框列的国家/地区名称?

时间:2018-05-01 07:45:04

标签: python-3.x pandas dataframe

我有这样的Dataframe。

problem.head(30)
Out[25]: 
     Country
0     Sweden
1     Africa
2     Africa
3     Africa
4     Africa
5    Germany
6    Germany
7    Germany
8    Germany
9         UK
10   Germany
11   Germany
12   Germany
13   Germany
14    Sweden
15    Sweden
16    Africa
17    Africa
18    Africa
19    Africa
20    Africa
21    Africa
22    Africa
23    Africa
24    Africa
25    Africa
26  Pakistan
27  Pakistan
28        ZA
29        ZA

现在我想用大陆名称替换国家/地区名称。因此,国家/地区名称将替换为其大陆名称。

我所做的是,我创建了所有Continent数组(我的数据框中有,我有56个国家),

asia = ['Afghanistan', 'Bahrain', 'United Arab Emirates','Saudi Arabia', 'Kuwait', 'Qatar', 'Oman',
    'Sultanate of Oman','Lebanon', 'Iraq', 'Yemen', 'Pakistan', 'Lebanon', 'Philippines', 'Jordan']
europe = ['Germany','Spain', 'France', 'Italy', 'Netherlands', 'Norway', 'Sweden','Czech Republic', 'Finland',
      'Denmark', 'Czech Republic', 'Switzerland', 'UK', 'UK&I', 'Poland', 'Greece','Austria',
      'Bulgaria', 'Hungary', 'Luxembourg', 'Romania' , 'Slovakia', 'Estonia', 'Slovenia','Portugal',
      'Croatia', 'Lithuania', 'Latvia','Serbia', 'Estonia', 'ME', 'Iceland' ]
africa = ['Morocco', 'Tunisia', 'Africa', 'ZA', 'Kenya']
other = ['USA', 'Australia', 'Reunion', 'Faroe Islands']

现在尝试使用

替换
dataframe['Continent'] = dataframe['Country'].replace(asia, 'Asia', regex=True)

其中亚洲是我的名单,亚洲是要替换的文本。但是没有用 它只适用于

dataframe['Continent'] = dataframe['Country'].replace(np.nan, 'Asia', regex=True)

所以,请帮助

2 个答案:

答案 0 :(得分:1)

最好将您的国家/地区 - 大陆地图存储为字典而不是四个单独的列表。您可以按照以下方式执行此操作,从当前列表开始:

continents = {country: 'Asia' for country in asia}
continents.update({country: 'Europe' for country in europe})
continents.update({country: 'Africa' for country in africa})
continents.update({country: 'Other' for country in other})

然后,您可以使用Pandas map功能将大陆映射到各个国家/地区:

dataframe['Continent'] = dataframe['Country'].map(continents)

答案 1 :(得分:1)

apply与自定义功能一起使用。

<强>演示:

import pandas as pd
asia = ['Afghanistan', 'Bahrain', 'United Arab Emirates','Saudi Arabia', 'Kuwait', 'Qatar', 'Oman',
    'Sultanate of Oman','Lebanon', 'Iraq', 'Yemen', 'Pakistan', 'Lebanon', 'Philippines', 'Jordan']
europe = ['Germany','Spain', 'France', 'Italy', 'Netherlands', 'Norway', 'Sweden','Czech Republic', 'Finland',
      'Denmark', 'Czech Republic', 'Switzerland', 'UK', 'UK&I', 'Poland', 'Greece','Austria',
      'Bulgaria', 'Hungary', 'Luxembourg', 'Romania' , 'Slovakia', 'Estonia', 'Slovenia','Portugal',
      'Croatia', 'Lithuania', 'Latvia','Serbia', 'Estonia', 'ME', 'Iceland' ]
africa = ['Morocco', 'Tunisia', 'Africa', 'ZA', 'Kenya']
other = ['USA', 'Australia', 'Reunion', 'Faroe Islands']

def GetConti(counry):
    if counry in asia:
        return "Asia"
    elif counry in europe:
        return "Europe"
    elif counry in africa:
        return "Africa"
    else:
        return "other"

df = pd.DataFrame({"Country": ["Sweden", "Africa", "Africa", "Germany", "Germany", "UK","Pakistan"]})
df['Continent'] = df['Country'].apply(lambda x: GetConti(x))
print(df)

<强>输出:

    Country Continent
0    Sweden    Europe
1    Africa    Africa
2    Africa    Africa
3   Germany    Europe
4   Germany    Europe
5        UK    Europe
6  Pakistan      Asia