Question

我是Python新手，我有一个需要一些复杂重塑的数据框。最好用一个使用伪数据的例子来描述：

我有这个：

我需要这个：

原始数据框是：

testdata = [('State', ['CA', 'FL', 'ON']),
     ('Country', ['US', 'US', 'CAN']),
     ('a1', [0.059485629, 0.968962817, 0.645435903]),
     ('b2', [0.336665658, 0.404398227, 0.333113735]),
     ('Test', ['Test1', 'Test2', 'Test3']),
     ('d', [20, 18, 24]),
     ('e', [21, 16, 25]),
     ]
df = pd.DataFrame.from_items(testdata)

我所追求的数据框是：

testdata2 = [('State', ['CA', 'CA',  'FL', 'FL', 'ON', 'ON']),
     ('Country', ['US', 'US', 'US', 'US', 'CAN', 'CAN']),
     ('Test', ['Test1', 'Test1', 'Test2', 'Test2',  'Test3', 'Test3']),
     ('Measurements', ['a1', 'b2', 'a1', 'b2',  'a1', 'b2']),
     ('Values', [0.059485629, 0.336665658,  0.968962817, 0.404398227, 0.645435903, 0.333113735]),
     ('Steps', [20,  21, 18,  16, 24, 25]),
     ]
dfn = pd.DataFrame.from_items(testdata2)

看起来解决方案可能需要使用熔化，堆叠和多索引，但我不确定如何将所有这些整合在一起。

非常感谢任何建议的解决方案。

谢谢。

Answer 1

试试吧：

df1 = df.melt(id_vars=['State','Country','Test'],value_vars=['a1','b2'],value_name='Values',var_name='Measuremensts')
df2 = df.melt(id_vars=['State','Country','Test'],value_vars=['d','e'],value_name='Steps').drop('variable',axis=1)
df1.merge(df2, on=['State','Country','Test'], right_index=True, left_index=True)

输出：

  State Country   Test Measuremensts    Values  Steps
0    CA      US  Test1            a1  0.059486     20
1    FL      US  Test2            a1  0.968963     18
2    ON     CAN  Test3            a1  0.645436     24
3    CA      US  Test1            b2  0.336666     21
4    FL      US  Test2            b2  0.404398     16
5    ON     CAN  Test3            b2  0.333114     25

或使用@JohnGalt解决方案：

pd.concat([pd.melt(df, id_vars=['State', 'Country', 'Test'], value_vars=x) for x in [['d', 'e'], ['a1', 'b2']]], axis=1)

Answer 2

有一种方法可以使用pd.wide_to_long执行此操作，但您必须重命名列，以便Measurements列包含正确的值

df1 = df.rename(columns={'a1':'Values_a1', 'b2':'Values_b2', 'd':'Steps_a1', 'e':'Steps_b2'})
pd.wide_to_long(df1, 
                stubnames=['Values', 'Steps'], 
                i=['State', 'Country', 'Test'], 
                j='Measurements', 
                sep='_', 
                suffix='.').reset_index()

  State Country   Test Measurements    Values  Steps
0    CA      US  Test1           a1  0.059486     20
1    CA      US  Test1           b2  0.336666     21
2    FL      US  Test2           a1  0.968963     18
3    FL      US  Test2           b2  0.404398     16
4    ON     CAN  Test3           a1  0.645436     24
5    ON     CAN  Test3           b2  0.333114     25

使用融合，堆栈和多索引重塑数据框架？

2 个答案: