根据pandas中另一列的值创建一个新列

时间:2020-04-22 13:58:42

标签: python numpy

在一个数据框中,我有一列用于说明不同国家/地区的名称,并且我想创建一个包含其地区的新列,例如该国家/地区是印度,该地区应该是亚洲等。我已经尝试使用np.where,但似乎我做错了什么。以下是我尝试过的代码:

Region = np.where(country_name == 'US' , "US", 
                 np.where(country_name == ('Brazil' or 'Canada' or 'Peru' or 'Chile') , "Rest of America", 
                 np.where(country_name == ('South Africa 'or 'Egypt' or 'Morocco' or 'Algeria' or 'Ghana'), "Africa", 
                 np.where(country_name == ('Afghanistan'or 'Armenia'or 'Azerbaijan' or 'Bahrain'or'Bangladesh'or 'Bhutan'or 
                                           'Brunei'or 'Burma'or 'Cambodia'or 'China'or 'East Timor' or
                                           'Georgia'or 'Hong Kong'or 'India' or 'Indonesia'or 'Iran' or 'Iraq'or 'Israel'or 'Japan'or
                                           'Jordan'or 'Kazakhstan'or 'Kuwait'or 'Kyrgyzstan'or 'Laos'or 
                                           'Lebanon'or 'Malaysia' or 'Mongolia'or 'Nepal'or 'North Korea'or 'Oman'or 'Pakistan'|
                                           'Papua New Guinea'or 'Philippines'or 'Qatar'or 'Russia'or 'Saudi Arabia'or 'Singapore'| 
                                           'South Korea'or 'Sri Lanka'or 'Syria'or 'Taiwan'or 'Tajikistan'or 'Thailand'or 'Turkey'or 'Turkmenistan'or
                                           'United Arab Emirates'or 'Uzbekistan'or 'Vietnam'or 'Yemen'), "Asia", 
                 np.where(country_name == ('Spain'or 'Italy' or 'Germany'or 'United Kingdom' or'France'), "Europe", "Unchange")))))

Below is the data:

     Entity        Region   Code       Date   Total confirmed deaths (deaths)   Total confirmed cases (cases)
0   Afghanistan     Asia    AFG     2019-12-31  0   0
1   Afghanistan     Asia    AFG     2020-01-01  0   0
2   Afghanistan     Asia    AFG     2020-01-02  0   0
3   Afghanistan     Asia    AFG     2020-01-03  0   0
4   Afghanistan     Asia    AFG     2020-01-04  0   0
5   Afghanistan     Asia    AFG     2020-01-05  0   0
6   Afghanistan     Asia    AFG     2020-01-06  0   0
7   Afghanistan     Asia    AFG     2020-01-07  0   0
8   Afghanistan     Asia    AFG     2020-01-08  0   0
9   Afghanistan     Asia    AFG     2020-01-09  0   0
10  Afghanistan     Asia    AFG     2020-01-10  0   0
11  Afghanistan     Asia    AFG     2020-01-11  0   0

但是此代码仅在第一个国家/地区有效,例如仅在巴西,南非,阿富汗和西班牙。

1 个答案:

答案 0 :(得分:0)

list_1 = ["Iceland", "Norway", "Sweden", "Finland","Denmark","United Kingdom", "Ireland",
              "France", "Belgium","Netherlands", "Luxembourg","Monaco", "Portugal", "Spain",
              "Andorra", "Italy","Malta","San Marino", "Vatican City", "Germany", 
              "Switzerland", "Liechtenstein"," Austria", "Poland", "Czech Republic", "Slovakia",
              "Hungary","Slovenia","Croatia", "Bosnia" ,"Herzegovina", "Serbia", "Montenegro", 
              "Albania", "Macedonia", "Romania", "Bulgaria","Greece", "Estonia", "Latvia", 
              "Lithuania", "Belarus", "Ukraine", "Moldova"]
    list_2 = ['Brazil' , 'Canada' , 'Peru' , 'Chile', 'South America']
    list_3 = ['Afghanistan', 'Armenia', 'Azerbaijan', 'Bahrain' ,'Bangladesh',  'Bhutan', 
              'Brunei', 'Burma', 'Cambodia', 'China', 'East Timor','Georgia',  'Hong Kong', 
              'India' , 'Indonesia', 'Iran' , 'Iraq' ,'Israel' , 'Japan','Jordan', 'Kazakhstan',
              'Kuwait' , 'Kyrgyzstan' , 'Laos', 'Lebanon', 'Malaysia' , 'Mongolia', 'Nepal', 
              'North Korea', 'Oman', 'Pakistan','Papua New Guinea', 'Philippines', 'Qatar', 
               'Saudi Arabia','Singapore', 'South Korea', 'Sri Lanka', 'Syria', 'Taiwan'
               'Tajikistan', 'Thailand', 'Turkey', 'Turkmenistan','United Arab Emirates', 
              'Uzbekistan', 'Vietnam', 'Yemen']
    list_4 = ['United States']
    list_5 = ['South Africa','Egypt' , 'Morocco' , 'Algeria' , 'Ghana', 'Africa', "Egypt"]



     conditions = [
            (df['Entity'].isin(list_4)),
            (df['Entity'].isin(list_2)),
            (df['Entity'].isin(list_5)),
            (df['Entity'].isin(list_3)),
            (df['Entity'].isin(list_1))
                     ]
        choices = ['US',"Rest of America","Africa","Asia","Europe"]
        Region = np.select(conditions, choices, default='Rest of the world')