这里是我使用的数据样本:
SCENARIO DATE POD AREA IDOC STATUS TYPE
AAA 02.06.2015 JKJKJKJKJKK 4210 713375 51 1
AAA 02.06.2015 JWERWERE 4210 713375 51 1
AAA 02.06.2015 JAFDFDFDFD 4210 713375 51 9
BBB 02.06.2015 AAAAAAAA 5400 713504 51 43
CCC 05.06.2015 BBBBBBBBBB 4100 756443 51 187
AAA 05.06.2015 EEEEEEEE 4100 756457 53 228
我在pandas中将以下代码写入groupby:
import pandas as pd
import numpy as np
xl = pd.ExcelFile("MRD.xlsx")
df = xl.parse("Sheet3")
#print (df.column.values)
# The following gave ValueError: Cannot label index with a null key
# dfi = df.pivot('SCENARIO)
# Here i do not actually need it to count every column, just a specific one
table = df.groupby(["SCENARIO", "STATUS", "TYPE"]).agg(['count'])
writer = pd.ExcelWriter('pandas.out.xlsx', engine='xlsxwriter')
table.to_excel(writer, sheet_name='Sheet1')
writer.save()
table2 = pd.DataFrame(df.groupby(["SCENARIO", "STATUS", "TYPE"])['TYPE'].count())
print (table2)
writer2 = pd.ExcelWriter('pandas2.out.xlsx', engine='xlsxwriter')
table2.to_excel(writer2, sheet_name='Sheet1')
writer2.save()
这会产生一个结果:
SCENARIO STATUS TYPE TYPE
AAA 51 1 2
9 1
53 228 1
BBB 51 43 1
CCC 51 187 1
Name: TYPE, dtype: int64
我怎样才能为每组添加小计?理想情况下,我希望实现类似的目标:
SCENARIO STATUS TYPE TYPE
AAA 51 1 2
9 1
Total 3
53 228 1
Total 1
BBB 51 43 1
Total 1
CCC 51 187 1
Total 1
Name: TYPE, dtype: int64
这可能吗?
答案 0 :(得分:5)
使用:
$str = '
<div>
<?php include("_include/php/readcateg.php");?>
<?php echo $var; ?>
</div>';
echo eval($str);
#if necessary convert TYPE column to string
df['TYPE'] = df['TYPE'].astype(str)
df = df.groupby(["SCENARIO", "STATUS", "TYPE"])['TYPE'].count()
#aggregate sum by first 2 levels
df1 = df.groupby(["SCENARIO", "STATUS"]).sum()
#add 3 level of MultiIndex
df1.index = [df1.index.get_level_values(0),
df1.index.get_level_values(1),
['Total'] * len(df1)]
#thanks MaxU for improving
#df1 = df1.set_index(np.array(['Total'] * len(df1)), append=True)
print (df1)
SCENARIO STATUS
AAA 51 Total 3
53 Total 1
BBB 51 Total 1
CCC 51 Total 1
Name: TYPE, dtype: int64
答案 1 :(得分:0)
Chris Moffitt 创建了一个名为 sidetable
的库来简化这个过程,它可以与带有访问器的 groupby 对象一起使用,使其变得非常简单。也就是说,接受的答案和评论是一座金矿,我觉得值得先检查一下。