我有以下熊猫数据框:
+---+-----+-----+------+------+------+------+
| | A | B | C_10 | C_20 | D_10 | D_20 |
+---+-----+-----+------+------+------+------+
| 1 | 0.1 | 0.2 | 1 | 2 | 3 | 4 |
| 2 | 0.3 | 0.4 | 5 | 6 | 7 | 8 |
+---+-----+-----+------+------+------+------+
现在,我想熔化列C_10
,C_20
,D_10
,D_20
以获取如下所示的数据框:
+---+-----+-----+----+---+---+
| | A | B | N | C | D |
+---+-----+-----+----+---+---+
| 1 | 0.1 | 0.2 | 10 | 1 | 3 |
| 1 | 0.1 | 0.2 | 20 | 2 | 4 |
| 2 | 0.3 | 0.4 | 10 | 5 | 7 |
| 2 | 0.3 | 0.4 | 20 | 6 | 8 |
+---+-----+-----+----+---+---+
有没有简单的方法可以做到这一点?谢谢!
编辑:我尝试过wide_to_long
,但是如果数据框中存在重复的行,则此操作不起作用:
df = pd.DataFrame({
'combination': [1, 1, 2, 2],
'A': [0.1, 0.1, 0.2, 0.2],
'B': [0.3, 0.3, 0.4, 0.4],
'C_10': [1, 5, 6, 7],
'C_20': [2, 6, 7, 8],
'D_10': [3, 7, 8, 9],
'D_20': [4, 8, 9, 10],
})
+--------------------------------------------------+
| combination A B C_10 C_20 D_10 D_20 |
+--------------------------------------------------+
| 0 1 0.1 0.3 1 2 3 4 |
| 1 1 0.1 0.3 5 6 7 8 |
| 2 2 0.2 0.4 6 7 8 9 |
| 3 2 0.2 0.4 7 8 9 10 |
+--------------------------------------------------+
如果我使用wide_to_long
,则会出现以下错误:
> pd.wide_to_long(df, stubnames=['C','D'], i=['combination', 'A', 'B'], j='N', sep='_').reset_index()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-31-cc5863fa7ecc> in <module>
----> 1 pd.wide_to_long(df, stubnames=['C','D'], i=['combination', 'A', 'B'], j='N', sep='_').reset_index()
pandas/core/reshape/melt.py in wide_to_long(df, stubnames, i, j, sep, suffix)
456
457 if df[i].duplicated().any():
--> 458 raise ValueError("the id variables need to uniquely identify each row")
459
460 value_vars = [get_var_names(df, stub, sep, suffix) for stub in stubnames]
ValueError: the id variables need to uniquely identify each row
参数i
被描述为“用作ID变量的列。”,但我不明白这到底是什么意思。
答案 0 :(得分:2)
使用wide_to_long
:
df = pd.wide_to_long(df, stubnames=['C','D'], i=['A','B'], j='N', sep='_').reset_index()
print (df)
A B N C D
0 0.1 0.2 10 1 3
1 0.1 0.2 20 2 4
2 0.3 0.4 10 5 7
3 0.3 0.4 20 6 8
编辑:如果A, B
列的可能组合不是唯一的,则可以创建将索引转换为列index
的帮助器列,应用解决方案并最后删除级别index
:
df = (pd.wide_to_long(df.reset_index(),
stubnames=['C','D'],
i=['index','A','B'],
j='N',
sep='_')
.reset_index(level=0, drop=True)
.reset_index())
print (df)
A B N combination C D
0 0.1 0.3 10 1 1 3
1 0.1 0.3 20 1 2 4
2 0.1 0.3 10 1 5 7
3 0.1 0.3 20 1 6 8
4 0.2 0.4 10 2 6 8
5 0.2 0.4 20 2 7 9
6 0.2 0.4 10 2 7 9
7 0.2 0.4 20 2 8 10