我有一个excel文件,如下所示:
A B C D E F G
run_1_clust_1.out: GLU 2 HN 2.07 3851 -0.90
GLY 1 HN 2.09 3196 -0.90
3 HN 2.05 3553 -0.90
HT1 2.12 2828 -0.91
HT2 2.05 3516 -0.90
run_1_clust_2.out: GLU 2 HN 2.12 1940 -0.90
GLY 1 HN 2.33 4030 -0.90
3 HN 2.43 3994 -0.90
HT1 2.11 2833 -0.91
HT2 2.05 3242 -0.90
我想按列B,C和D对E,F和G列进行分组。对于这样的输出:
run_1_clust_1.out: GLY 1 HN 2.09 3196 -0.90
run_1_clust_2.out: GLY 1 HN 2.33 4030 -0.90
run_1_clust_1.out: GLU 2 HN 2.07 3851 -0.90
run_1_clust_2.out: GLU 2 HN 2.12 1940 -0.90
run_1_clust_1.out: GLY 3 HN 2.05 3553 -0.90
run_1_clust_2.out: GLY 3 HN 2.43 3994 -0.90
run_1_clust_1.out: GLY 3 HT1 2.12 2828 -0.91
....
我正在使用pandas,但我不确定为什么我会让AttributeError告诉我使用'apply'方法。
import pandas as pd
writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
xl = pd.ExcelFile('test.xlsx')
df = xl.parse("Sheet1")
df.columns = df[['a','b','c','d','e','f','g']]
df = df.groupby(['b','c','d'])
df.to_excel(writer, sheet_name="Sheet1")
writer.save()
答案 0 :(得分:1)
试试这个。主要区别在于:我已指定计算以执行分组和重置索引,以便输出是数据帧。
import pandas as pd
writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
xl = pd.ExcelFile('test.xlsx')
df = xl.parse("Sheet1")
df.columns = df[['a','b','c','d','e','f','g']]
group_cols = ['b','c','d']
sum_cols = ['e', 'f', 'g']
df = df[group_cols+sum_cols].groupby(group_cols).sum().reset_index()
df.to_excel(writer, sheet_name="Sheet1")
writer.save()