我有一个pandas数据框,其中列为年,国家为行名:
Country | 1960 | 1961 | 1962 | 1963
-----------------------------------------
United States | 1000 | 2000 | 3000 | 4000
-----------------------------------------
Argentina | 1000 | 2000 | 3000 | 4000
-----------------------------------------
我想将其转换为:
Country | Year | Value
-----------------------------
Unites States | 1960 | 1000
Unites States | 1961 | 2000
Unites States | 1962 | 3000
Unites States | 1963 | 4000
Argentina | 1960 | 1000
Argentina | 1961 | 2000
Argentina | 1962 | 3000
Argentina | 1963 | 4000
我不确定需要应用哪些拆分,排序或组操作来实现此目标。
谢谢!
答案 0 :(得分:3)
您可以使用堆叠方法:
>>> df=pd.DataFrame({"country":["United States","Argentina"],
1960:[1000,1000],
1961:[2000,2000],
1962:[3000,3000],
1963:[4000,4000]} )
>>> df
1960 1961 country 1963 1962
0 1000 2000 United States 4000 3000
1 1000 2000 Argentina 4000 3000
>>> df.set_index("country").stack()
country
United States 1960 1000
1961 2000
1963 4000
1962 3000
Argentina 1960 1000
1961 2000
1963 4000
1962 3000
dtype: int64
>>> df.set_index("country").stack().reset_index()
country level_1 0
0 United States 1960 1000
1 United States 1961 2000
2 United States 1963 4000
3 United States 1962 3000
4 Argentina 1960 1000
5 Argentina 1961 2000
6 Argentina 1963 4000
7 Argentina 1962 3000
我希望这可以帮到你
答案 1 :(得分:3)
举一个完整的例子,
In [1]: df = pd.DataFrame([['United States', 1000, 2000, 3000, 4000],
['Argentina', 1000, 2000, 3000, 4000]],
columns=['Country', 1960, 1961, 1962, 1963])
In [2]: df.set_index('Country', inplace=True)
In [3]: df = df.stack().reset_index()
In [4]: df.columns = ['Country', 'Year', 'Value']
产量
Country Year Value
0 United States 1960 1000
1 United States 1961 2000
2 United States 1962 3000
3 United States 1963 4000
4 Argentina 1960 1000
5 Argentina 1961 2000
6 Argentina 1962 3000
7 Argentina 1963 4000
要删除索引列并使用Country列作为索引,可以使用
In [3]: df = df.stack().reset_index(1)
In [4]: df.columns = ['Year', 'Value']
产生
Year Value
Country
United States 1960 1000
United States 1961 2000
United States 1962 3000
United States 1963 4000
Argentina 1960 1000
Argentina 1961 2000
Argentina 1962 3000
Argentina 1963 4000
答案 2 :(得分:0)
这不完全是您想要的,但使用df.stack()
,您可以获得以下内容:
0 Country United States
1960 1000
1961 2000
1962 3000
1963 2300
1 Country Argentina
1960 1000
1961 2000
1962 3000
1963 4000