我正在python中使用pandas来透视一些数据,并且我希望能够在透视表的各个部分之间执行两种类型的聚合。我知道我可以使用边距对所有行/列执行汇总。 但是我想在单个列中聚合多个行(不是全部),或者在单个行中聚合多个列。如何最好地汇总熊猫中的子行和子列?
示例代码设置:
#Dataset
rows = [
[1, 'Factory_1', 'crusher', 'electricity_usage', 15],
[2, 'Factory_1', 'mixer', 'electricity_usage', 11],
[3, 'Factory_1', 'turner', 'electricity_usage', 12],
[4, 'Factory_2', 'crusher', 'electricity_usage', 2],
[5, 'Factory_2', 'mixer', 'electricity_usage', 7],
[6, 'Factory_2', 'turner', 'electricity_usage', 13],
[7, 'Factory_1', 'crusher', 'running_hours', 6],
[8, 'Factory_1', 'mixer', 'running_hours', 5],
[9, 'Factory_1', 'turner', 'running_hours', 5],
[10, 'Factory_2', 'crusher', 'running_hours', 1],
[11, 'Factory_2', 'mixer', 'running_hours', 3],
[12, 'Factory_2', 'turner', 'running_hours', 6]
]
dataFrame = pds.DataFrame(rows, columns=["id","Location","Type","recorded_type","value"])
#Pivot Table 1: Form multi row aggregation across a single column
ptable_1 = pds.pivot_table(data=dataFrame,index=['Location', 'Type'], columns=["recorded_type"], values=['value'])
print(ptable_1)
#Pivot Table 2: Form multi column aggregation across a single row
ptable_2 = pds.pivot_table(data=dataFrame,index=['recorded_type'], columns=["Location", "Type"], values=['value'])
print(ptable_2)
下面,我尝试在单个列中的多个行上聚合数据透视1。我正在尝试汇总每个位置的所有计算机的recorded_values之和。可以做得更好吗?
#Form aggregation across multiple rows in a single column
df1 = ptable_1.groupby(level=[0]).sum()
df1['Type'] = ["all", "all"]
#Reset index so machine_location is removed from current index
df1.reset_index(inplace=True)
#Set multi-index of location and type
df1.set_index(['Location', 'Type'], inplace=True)
#Concat both dataframes
aggregated_table_1 = pds.concat([ptable_1.reset_index(),df1.reset_index()], ignore_index=True)
#Sort values by location, so appened table values are in the correct position
aggregated_table_1.sort_values('Location', inplace=True)
print(aggregated_table_1)
例如,我正在尝试汇总特定工厂的所有机器类型的用电量。因此,聚合位于类型为“ all”的“类型”列中 ptable_1的预期输出:
+---------------+-----------+---------+-------------------+---------------+
| | Location | Type | value | value |
+---------------+-----------+---------+-------------------+---------------+
| recorded_type | | | electricity_usage | running_hours |
| | Factory_1 | crusher | 15 | 6 |
| | Factory_1 | mixer | 11 | 5 |
| | Factory_1 | turner | 12 | 5 |
| | Factory_1 | all | 38 | 16 |
| | Factory_2 | crusher | 2 | 1 |
| | Factory_2 | mixer | 7 | 3 |
| | Factory_2 | turner | 13 | 6 |
| | Factory_2 | all | 22 | 10 |
+---------------+-----------+---------+-------------------+---------------+
其次,我不确定如何在各个子列之间进行汇总,如下所示,以得出ptable_2每种类型的所有列的总和。聚合是一个新列,其类型为'all'
ptable_2的预期输出:
+-------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| Location | Factory_1 | Factory_1 | Factory_1 | Factory_1 | Factory_2 | Factory_2 | Factory_2 | Factory_2 |
+-------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| Type | crusher | mixer | turner | all | crusher | mixer | turner | all |
| recorded_type | | | | | | | | |
| electricity_usage | 15 | 11 | 12 | 38 | 2 | 7 | 13 | 22 |
| running_hours | 6 | 5 | 5 | 16 | 1 | 3 | 6 | 10 |
+-------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
编辑1 这是我的输出,直接来自使用默认参数的melt()的Serge de Gosson de Varennes方法的python。我丢失了每一行的recorded_type记录,该记录被NaN列替换。我是否应该尝试以此汇总以形成预期的输出?
Df_ex1 = dfex1.melt() # Expected output 1
NaN recorded_type value
0 value electricity_usage 15
1 value electricity_usage 11
2 value electricity_usage 12
3 value electricity_usage 2
4 value electricity_usage 7
5 value electricity_usage 13
6 value running_hours 6
7 value running_hours 5
8 value running_hours 5
9 value running_hours 1
10 value running_hours 3
11 value running_hours 6
Df_exp2 = dfex2.melt() # Expected output 2
NaN Location Type value
0 value Factory_1 crusher 15
1 value Factory_1 crusher 6
2 value Factory_1 mixer 11
3 value Factory_1 mixer 5
4 value Factory_1 turner 12
5 value Factory_1 turner 5
6 value Factory_2 crusher 2
7 value Factory_2 crusher 1
8 value Factory_2 mixer 7
9 value Factory_2 mixer 3
10 value Factory_2 turner 13
11 value Factory_2 turner 6
答案 0 :(得分:0)
您几乎是对的:您需要融合数据框:
import pandas as pds
rows = [
[1, 'Factory_1', 'crusher', 'electricity_usage', 15],
[2, 'Factory_1', 'mixer', 'electricity_usage', 11],
[3, 'Factory_1', 'turner', 'electricity_usage', 12],
[4, 'Factory_2', 'crusher', 'electricity_usage', 2],
[5, 'Factory_2', 'mixer', 'electricity_usage', 7],
[6, 'Factory_2', 'turner', 'electricity_usage', 13],
[7, 'Factory_1', 'crusher', 'running_hours', 6],
[8, 'Factory_1', 'mixer', 'running_hours', 5],
[9, 'Factory_1', 'turner', 'running_hours', 5],
[10, 'Factory_2', 'crusher', 'running_hours', 1],
[11, 'Factory_2', 'mixer', 'running_hours', 3],
[12, 'Factory_2', 'turner', 'running_hours', 6]
]
dataFrame = pds.DataFrame(rows, columns=["id","Location","Type","recorded_type","value"])
ptable_1 = pds.pivot_table(data=dataFrame,index=['Location', 'Type'], columns=["recorded_type"], values=['value'])
ptable_2 = pds.pivot_table(data=dataFrame,index=['recorded_type'], columns=["Location", "Type"], values=['value'])
df = pds.DataFrame(ptable_1)
dfex1 = pds.DataFrame(ptable_1)
dfex2 = pds.DataFrame(ptable_2)
给你
Df_ex1 = dfex1.melt # Expected output 1
Df_exp2 = dfex2.melt # Expected output 2