想象一下这是我的输入数据:
data = [("France", "Paris", "Male", "1"),
("France", "Paris", "Female", "6"),
("France", "Nice", "Male", "2"),
("France", "Nice", "Female", "7"),
("Germany", "Berlin", "Male", "3"),
("Germany", "Berlin", "Female", "8"),
("Germany", "Munchen", "Male", "4"),
("Germany", "Munchen", "Female", "9"),
("Germany", "Koln", "Male", "5"),
("Germany", "Koln", "Female", "10")]
我想把它放到像这样的数据框中:
Country City Sex
Male Female
France Paris 1 6
Nice 2 7
Germany Berlin 3 8
Munchen 4 9
Koln 5 10
第一部分很简单:
df = pd.DataFrame(data, columns=["country", "city", "sex", "count"])
df = df.set_index(["country", "city"])
给我输出:
sex count
country city
France Paris Male 1
Paris Female 6
Nice Male 2
Nice Female 7
Germany Berlin Male 3
Berlin Female 8
Munchen Male 4
Munchen Female 9
Koln Male 5
Koln Female 10
所以这些行是可以的,但现在我想把“性别”的值放在一起。列成多列索引。是否可以这样做,如果是这样,怎么办?
答案 0 :(得分:2)
将Sex
列添加到set_index
中的list
并致电unstack
:
df = df.set_index(["country", "city",'sex']).unstack()
#data cleaning - remove columns name sex and rename column count
df = df.rename_axis((None, None),axis=1).rename(columns={'count':'Sex'})
print (df)
Sex
Female Male
country city
France Nice 7 2
Paris 6 1
Germany Berlin 8 3
Koln 10 5
Munchen 9 4
答案 1 :(得分:0)
使用枢轴取代堆叠的另一种方法(两者几乎都是相同的),即
df.set_index(['country','city']).pivot(columns='sex')
count sex Female Male country city France Nice 7 2 Paris 6 1 Germany Berlin 8 3 Koln 10 5 Munchen 9 4