Pandas - 将具有多个列的数据帧重新整形/转换为单列值

时间:2016-04-17 21:00:33

标签: python pandas

我有一个pandas数据框,其中列为年,国家为行名:

Country       | 1960 | 1961 | 1962 | 1963
-----------------------------------------
United States | 1000 | 2000 | 3000 | 4000
-----------------------------------------
Argentina     | 1000 | 2000 | 3000 | 4000
-----------------------------------------

我想将其转换为:

Country       | Year | Value
-----------------------------
Unites States | 1960 | 1000
Unites States | 1961 | 2000
Unites States | 1962 | 3000
Unites States | 1963 | 4000
Argentina     | 1960 | 1000
Argentina     | 1961 | 2000
Argentina     | 1962 | 3000
Argentina     | 1963 | 4000

我不确定需要应用哪些拆分,排序或组操作来实现此目标。

谢谢!

3 个答案:

答案 0 :(得分:3)

您可以使用堆叠方法:

>>> df=pd.DataFrame({"country":["United States","Argentina"],
1960:[1000,1000],
1961:[2000,2000],
1962:[3000,3000],
1963:[4000,4000]} )
>>> df
   1960  1961        country  1963  1962
0  1000  2000  United States  4000  3000
1  1000  2000      Argentina  4000  3000
>>> df.set_index("country").stack()
country
United States  1960    1000
               1961    2000
               1963    4000
               1962    3000
Argentina      1960    1000
               1961    2000
               1963    4000
               1962    3000
dtype: int64
>>> df.set_index("country").stack().reset_index()
         country  level_1     0
0  United States     1960  1000
1  United States     1961  2000
2  United States     1963  4000
3  United States     1962  3000
4      Argentina     1960  1000
5      Argentina     1961  2000
6      Argentina     1963  4000
7      Argentina     1962  3000

我希望这可以帮到你

答案 1 :(得分:3)

举一个完整的例子,

In [1]: df = pd.DataFrame([['United States', 1000, 2000, 3000, 4000],
                           ['Argentina', 1000, 2000, 3000, 4000]],
                          columns=['Country', 1960, 1961, 1962, 1963])

In [2]: df.set_index('Country', inplace=True)
In [3]: df = df.stack().reset_index()
In [4]: df.columns = ['Country', 'Year', 'Value']

产量

         Country  Year  Value
0  United States  1960   1000
1  United States  1961   2000
2  United States  1962   3000
3  United States  1963   4000
4      Argentina  1960   1000
5      Argentina  1961   2000
6      Argentina  1962   3000
7      Argentina  1963   4000

要删除索引列并使用Country列作为索引,可以使用

In [3]: df = df.stack().reset_index(1)
In [4]: df.columns = ['Year', 'Value']

产生

               Year  Value
Country                   
United States  1960   1000
United States  1961   2000
United States  1962   3000
United States  1963   4000
Argentina      1960   1000
Argentina      1961   2000
Argentina      1962   3000
Argentina      1963   4000

答案 2 :(得分:0)

这不完全是您想要的,但使用df.stack(),您可以获得以下内容:

0  Country    United States
    1960               1000
    1961               2000
    1962               3000
    1963               2300
1  Country        Argentina
    1960               1000
    1961               2000
    1962               3000
    1963               4000