我有一个包含4个标题行的数据框。像这样:
A01 A01 A01 A01 A01 A01
1 1 1 2 2 2
Mon Mon Mon Tue Tue Tue
# Beverages # Appliances Avg. brewing duration # Beverages # Appliances Avg. brewing duration
Americano 549 46 "101,5" 542 38
ApplianceOffRinsing 28.718 673 "52,6" 28.718 665
ApplianceOnRinsing 35.381 682 "180,8" 35.308 676
CafeAuLait 112 16 "124,4" 99 10
我如何在大熊猫中融化它?
我已经这样阅读它:
df = pd.read_csv('sample.csv', header=[0, 1, 2, 3, 4], delimiter='\t')
现在看起来像这样:
我想要输出如下所示的数据:
A01 1 Mon # Beverages 549
A01 1 Mon # Applicances 46
...
具有我设置的列名。
我知道它类似于pd.melt(df, col_level=0, id_vars=['A'], value_vars=['B'])
,但是我不确定如何针对我的用例进行调整,特别是因为我的数据没有列名。
立即添加示例数据...
答案 0 :(得分:0)
有问题,第一列应该是索引,因为只有文本列。
因此在输出中获取混合的数字和文本数据:
df2 = df.melt(var_name=['a','b','c','d'], value_name='e')
print (df2)
a b c d e
0 A01 1 Mon #Beverages Americano
1 A01 1 Mon #Beverages ApplianceOffRinsing
2 A01 1 Mon #Beverages ApplianceOnRinsing
3 A01 1 Mon #Beverages CafeAuLait
4 A01 1 Mon #Appliances 549
5 A01 1 Mon #Appliances 28.718
6 A01 1 Mon #Appliances 35.381
7 A01 1 Mon #Appliances 112
8 A01 1 Mon Avg.brewingduration 46
9 A01 1 Mon Avg.brewingduration 673
10 A01 1 Mon Avg.brewingduration 682
11 A01 1 Mon Avg.brewingduration 16
12 A01 2 Tue #Beverages 101,5
13 A01 2 Tue #Beverages 52,6
14 A01 2 Tue #Beverages 180,8
15 A01 2 Tue #Beverages 124,4
16 A01 2 Tue #Appliances 542
17 A01 2 Tue #Appliances 28.718
18 A01 2 Tue #Appliances 35.308
19 A01 2 Tue #Appliances 99
20 A01 2 Tue Avg.brewingduration 38
21 A01 2 Tue Avg.brewingduration 665
22 A01 2 Tue Avg.brewingduration 676
23 A01 2 Tue Avg.brewingduration 10