我有一个熊猫数据框,如下所示:
import pandas as pd
import numpy as np
data = {
"Type": ["A", "A", "B", "B", "B"],
"Project": ["X123", "X123", "X21", "L31", "L31"],
"Number": [100, 300, 100, 200, 500],
"Status": ['Y', 'Y', 'N', 'Y', 'N']
}
df = pd.DataFrame.from_dict(data)
我想按类型分组,并获得多个条件的计数和总和,并得到如下结果:
Type Total_Count Total_Number Count_Status=Y Number_Status=Y Count_Status=N Number_Status=N
A 2 400 2 400 0 0
B 5 800 1 200 2 600
我已尝试遵循但并非完全符合我的需求。请分享您可能有的任何想法。谢谢!
df1 = pd.pivot_table(df, index = 'Type', values = 'Number', aggfunc = np.sum)
df2 = pd.pivot_table(df, index = 'Type', values = 'Project', aggfunc = 'count')
pd.concat([df1, df2], axis=1)
答案 0 :(得分:5)
如果要创建功能:
def my_agg(x):
names = {
'Total_Count': x['Type'].count(),
'Total_Number': x['Number'].sum(),
'Count_Status=Y': x[x['Status']=='Y']['Type'].count(),
'Number_Status=Y': x[x['Status']=='Y']['Number'].sum(),
'Count_Status=N': x[x['Status']=='N']['Type'].count(),
'Number_Status=N': x[x['Status']=='N']['Number'].sum()}
return pd.Series(names)
df.groupby('Type').apply(my_agg)
Total_Count Total_Number Count_Status=Y Number_Status=Y Count_Status=N Number_Status=N
Type
A 2 400 2 400 0 0
B 3 800 1 200 2 600
答案 1 :(得分:4)
以pivot_table
开头:
pv = (df.pivot_table(index='Type',
columns='Status',
values='Number',
aggfunc='sum')
.add_prefix('Number_Status='))
print(pv)
Status Number_Status=N Number_Status=Y
Type
A NaN 400.0
B 600.0 200.0
接下来,groupby
:
totals = df.groupby('Type').Number.agg([
('Total_Count', 'count'), ('Total_Number', 'sum')])
print(totals)
Total_Count Total_Number
Type
A 2 400
B 3 800
最后,状态取决于OHE:
cnts = (df.set_index('Type').Status
.str.get_dummies()
.sum(level=0)
.add_prefix('Count_Status='))
Count_Status=N Count_Status=Y
Type
A 0 2
B 2 1
将它们放在一起:
pd.concat([pv, totals, cnts], axis=1).sort_index(axis=1)
Count_Status=N Count_Status=Y Number_Status=N Number_Status=Y \
Type
A 0 2 NaN 400.0
B 2 1 600.0 200.0
Total_Count Total_Number
2 400
3 800
答案 2 :(得分:2)
处理
s1=df.groupby('Type').Number.agg(['count','sum'])
s2=df.groupby(['Type','Status']).Number.agg(['count','sum']).unstack(fill_value=0).sort_index(level=1,axis=1)
s2.columns=s2.columns.map('_Status='.join)
s1=s1.add_prefix('Total_')
s=pd.concat([s1,s2],axis=1)
s
Total_count Total_sum count_Status=N sum_Status=N count_Status=Y \
Type
A 2 400 0 0 2
B 3 800 2 600 1
sum_Status=Y
Type
A 400
B 200
答案 3 :(得分:2)
您可以使用margins
的{{1}}参数。只需要按行的边距,将列的总和放在末尾。
pd.pivot_table
如果需要,请重命名列:
import pandas as pd
df1 = df.pivot_table(index='Type', columns='Status', values='Number',
aggfunc=['sum', 'count'],
margins=True,
margins_name='Total').fillna(0).drop('Total')
# sum count
#Status N Y Total N Y Total
#Type
#A 0.0 400.0 400 0.0 2.0 2
#B 600.0 200.0 800 2.0 1.0 3
d = {'Y': 'Status=Y', 'N': 'Status=N', 'Total': 'Total'}
df1.columns = [f'{x}_{d.get(y)}' for x,y in df1.columns]
:df1
答案 4 :(得分:1)
您可以使用pandas.core.groupby.GroupBy.apply
完成此任务。例如,您可以在获取Groupby对象之后编写一个函数来处理每一列上的数据。
def compute_metrics(x):
result = {'Total_Number': x['Number'].sum(), 'Count_Status=Y': len(x['Status'] == "Y")}
return pd.Series(result)
然后df.groupby('Type').apply(compute_metrics)
将返回如下数据帧:
Type Total Number Count_Status=Y
A 400 2
B 800 3
希望这会有所帮助。
干杯。