我有一个包含价格,日期和成本类型的数据集。 列“费用类型”元素,以'-'字符分隔。 我想对元素求和并归为A1,A2,A3 ... category。 我在熊猫的stackoverflow中看到了一些问题和答案,但它们都解决了 一个特殊的问题
原始数据框如下:
price date cost type
+ 14,000 1399/03/02 A11 - A1 -A
+ 5,500 1399/02/25 A31 - A3 -A
+ 67,500 1399/02/22 A21 - A2 -A
+ 10,000 1399/02/20 A11 - A1 -A
+ 8,000 1399/02/19 A12 - A1 -A
+ 5,000 1399/02/19 A31 - A3 -A
+ 8,000 1399/02/15 A12 - A1 -A
+ 5,000 1399/02/12 A32 - A3 -A
+ 14,000 1399/02/10 A13 - A1 -A
+ 5,000 1399/02/09 A31 - A3 -A
+ 2,000 1399/02/08 A33 - A3 -A
+ 27,200 1399/02/03 A11 - A1 -A
+ 66,500 1399/01/31 A21 - A2 -A
+ 10,000 1399/01/20 A11 - A1 -A
+ 10,000 1399/01/18 A12 - A1 -A
+ 10,000 1399/01/18 A11 - A1 -A
+ 8,000 1399/01/06 A12 - A1 -A
+ 9,000 1399/01/04 A11 - A1 -A
+ 20,000 1398/12/28 A14 - A1 -A
我想总结和分组
结果数据框如下所示:
CostType(Main ) CostType(Branch ) Cost
A A1 Sum of all element ( A11 , A12 , A13 , … )
A2 Sum of all element ( A21 , A22 , A23 , … )
A3 Sum of all element ( A31 , A32 , A33 , … )
答案 0 :(得分:0)
用str.split()
拆分您要拆分的列。然后,将其连接到原始数据框。将它们组合在一起并汇总。
import pandas as pd
import numpy as np
import io
data = '''
price date "cost type"
14,000 1399/03/02 "A11 - A1 -A"
5,500 1399/02/25 "A31 - A3 -A"
67,500 1399/02/22 "A21 - A2 -A"
10,000 1399/02/20 "A11 - A1 -A"
8,000 1399/02/19 "A12 - A1 -A"
5,000 1399/02/19 "A31 - A3 -A"
8,000 1399/02/15 "A12 - A1 -A"
5,000 1399/02/12 "A32 - A3 -A"
14,000 1399/02/10 "A13 - A1 -A"
5,000 1399/02/09 "A31 - A3 -A"
2,000 1399/02/08 "A33 - A3 -A"
27,200 1399/02/03 "A11 - A1 -A"
66,500 1399/01/31 "A21 - A2 -A"
10,000 1399/01/20 "A11 - A1 -A"
10,000 1399/01/18 "A12 - A1 -A"
10,000 1399/01/18 "A11 - A1 -A"
8,000 1399/01/06 "A12 - A1 -A"
9,000 1399/01/04 "A11 - A1 -A"
20,000 1398/12/28 "A14 - A1 -A"
'''
df = pd.read_csv(io.StringIO(data), sep='\s+')
df['price'] = df['price'].str.replace(',','').astype(int)
df2 = pd.concat([df[['price','date']], df['cost type'].str.split('-', expand=True)], axis=1)
df2.rename(columns={0:'type_c',1:'type_b',2:'type_a'}, inplace=True)
df2.groupby(['type_a','type_b'])['price'].sum().reset_index()
type_a type_b price
0 A A1 148200
1 A A2 134000
2 A A3 22500