Hellow Stack Overflow社区,
我有一个df
,其中有一个名为“native_country”的列。但是,我想建立一个新的专栏,将这些国家划分为各大洲。例如,中国将与所有属于Aisa的国家分组。代码如下所示,
首先,我创建一个ContinentDict
来保存国家/大陆,
ContinentDict = {'China':'Asia', 'Cambodia':'Asia', 'Hong':'Asia',
'India':'Asia', 'Japan':'Asia', 'Laos':'Asia',
'Philippines':'Asia',
'South':'Asia', 'Taiwan':'Asia', 'Thailand':'Asia',
'Vietnam':'Asia', 'Canada':'Canada', 'United States':'United
States',
'Cuba':'Caribbean', 'Dominican-Republic':'Caribbean',
'Haiti':'Caribbean', 'Jamaica':'Caribbean',
'Trinadad&Tobago':'Caribbean',
'England':'Europe', 'France':'Europe', 'Germany':'Europe',
'Greece':'Europe', 'Holand-Netherlands':'Europe',
'Hungary':'Europe',
'Ireland':'Europe', 'Italy':'Europe', 'Poland':'Europe',
'Portugal':'Europe', 'Scotland':'Europe',
'Yugoslavia':'Europe',
'Columbia':'Latin America', 'Ecuador':'Latin America',
'El-Salvador':'Latin America', 'Guatemala':'Latin America',
'Honduras':'Latin America', 'Nicaragua':'Latin America',
'Peru':'Latin America', 'Mexico':'Mexico', '?':'Unknown',
'Outlying-US(Guam-USVI-etc)':'US Territories', 'Puerto-
Rico':'US Territories'}
接下来,我将大陆分配给df
df = df.assign(continent=df['native_country'].map(ContinentDict))
但是,continents
列中填充了NaN。有谁知道为什么?有什么我想念的吗?
任何帮助将不胜感激!
答案 0 :(得分:0)
df.iloc[df['native_country'].map(ContinentDict).argsort()]
答案 1 :(得分:0)
df = pd.DataFrame({'native_country': ContinentDict.keys()})
df = df.assign(continent=df['native_country'].map(ContinentDict))
>>> df.head()
native_country continent
0 Canada Canada
1 Honduras Latin America
2 Hong Asia
3 Dominican-Republic Caribbean
4 Italy Europe
midx = pd.MultiIndex.from_arrays([df['continent'], df['native_country']])
>>> midx
MultiIndex(levels=[[u'Asia', u'Canada', u'Caribbean', u'Europe', u'Latin America', u'Mexico', u'US Territories', u'United States', u'Unknown'], [u'?', u'Cambodia', u'Canada', u'China', u'Columbia', u'Cuba', u'Dominican-Republic', u'Ecuador', u'El-Salvador', u'England', u'France', u'Germany', u'Greece', u'Guatemala', u'Haiti', u'Holand-Netherlands', u'Honduras', u'Hong', u'Hungary', u'India', u'Ireland', u'Italy', u'Jamaica', u'Japan', u'Laos', u'Mexico', u'Nicaragua', u'Outlying-US(Guam-USVI-etc)', u'Peru', u'Philippines', u'Poland', u'Portugal', u'Puerto-Rico', u'Scotland', u'South', u'Taiwan', u'Thailand', u'Trinadad&Tobago', u'United States', u'Vietnam', u'Yugoslavia']],
labels=[[1, 4, 0, 2, 3, 4, 5, 6, 0, 3, 3, 3, 0, 3, 7, 4, 0, 3, 0, 4, 0, 0, 2, 3, 3, 2, 4, 2, 0, 6, 4, 0, 3, 2, 3, 0, 0, 3, 4, 3, 8], [2, 16, 17, 6, 21, 28, 25, 27, 29, 18, 33, 40, 1, 10, 38, 7, 39, 20, 24, 4, 36, 34, 22, 9, 31, 5, 8, 14, 19, 32, 13, 3, 15, 37, 12, 23, 35, 11, 26, 30, 0]],
names=[u'continent', u'native_country'])
在数据框中有国家/地区和大陆后,您只需设置索引:
df = df.assign(data=1)
>>> df.set_index(['continent', 'native_country']).sort_index()
data
continent native_country
Asia Cambodia 1
China 1
Hong 1
India 1
Japan 1
Laos 1
Philippines 1
South 1
Taiwan 1
Thailand 1
Vietnam 1
Canada Canada 1
Caribbean Cuba 1
Dominican-Republic 1
Haiti 1
Jamaica 1
Trinadad&Tobago 1
Europe England 1
France 1
Germany 1
Greece 1
Holand-Netherlands 1
Hungary 1
Ireland 1
Italy 1
Poland 1
Portugal 1
Scotland 1
Yugoslavia 1
Latin America Columbia 1
Ecuador 1
El-Salvador 1
Guatemala 1
Honduras 1
Nicaragua 1
Peru 1
Mexico Mexico 1
US Territories Outlying-US(Guam-USVI-etc) 1
Puerto-Rico 1
United States United States 1
Unknown ? 1