我正在尝试使用Choropleth绘制一些数据(尤其是来自GitHub(terrorism in EU countries的数据集)。
我有这样的东西:
year country1 countr2 country3
1970 10 20 30
1971 40 50 60
1972 70 80 90
据我所知,应该有这样的内容:
year country value
1970 country1 10
1970 country2 20
1970 country3 30
1971 country1 40
1971 country2 50
1971 country3 60
1972 country1 70
1972 country2 80
1972 country3 90
我如何通过Pandas实现这一目标?这是解决问题的好方法吗?
非常感谢您。
答案 0 :(得分:0)
这种任务只是熊猫的竹子:)
您只需要stack
您的DataFrame:
>>> import pandas as pd
>>> # First you need to make `iyear` as index when reading csv to DataFrame.
>>> df = pd.read_csv('eu_terrorism_fatalities_by_country.csv', index_col=0)
>>> df.iloc[0:5, 0:3] # Take a look
Belgium Denmark France
iyear
1970 0 0 0
1971 0 0 0
1972 0 0 1
1973 0 0 5
1974 0 0 3
>>> res = df.stack() # Just this simple :D
>>> res.head() # That's it.
iyear
1970 Belgium 0
Denmark 0
France 0
Germany 0
Greece 2
dtype: int64
请注意,结果res
是 MultiIndex Series ,以及一些后续内容:
>>> res.index.names = ['year', 'country']
>>> res.name = 'value'
>>> res.head()
year country
1970 Belgium 0
Denmark 0
France 0
Germany 0
Greece 2
Name: value, dtype: int64
>>> res.to_csv('results.csv', header=True)
在results.csv
文件中:
year,country,value
1970,Belgium,0
1970,Denmark,0
1970,France,0
... ...
2014,Portugal,0
2014,Spain,0
2014,United Kingdom,0
如果您想将MultiIndex Series res
转换为DataFrame,只需关注reset_index
并使用其args来控制行为,请遵循您的评论:
>>> flat = res.reset_index()
>>> flat.head()
year country value
0 1970 Belgium 0
1 1970 Denmark 0
2 1970 France 0
3 1970 Germany 0
4 1970 Greece 2
>>> flat2 = res.reset_index(level=1)
>>> flat2.head()
country value
year
1970 Belgium 0
1970 Denmark 0
1970 France 0
1970 Germany 0
1970 Greece 2