很抱歉这么愚蠢......我正在尝试聚合和统计pandas数据框中的数据
m_supplier_name m_eshop_order_number
1 Axxxx 202-0147880-7096352
2 Bxxxx 304-3926447-6272314
3 Bxxxx 306-6312344-4699555
4 Cxxxx 305-0612485-9987518
5 Cxxxx 306-6591383-8935534
6 Cxxxx 261620209654-1987667522016
7 Bxxxx 306-3078039-1083529
8 Bxxxx 305-6383385-7949949
9 Dxxxx P1824098
… …
这是我的熊猫查询:
print mdata.groupby(['m_supplier_name','m_order_number']).size()
这给了我:
m_supplier_name m_order_number
Axxxx 028-8286652-7920300 1
303-2594460-9759543 1
303-8120802-8687554 1
Bxxxx 302-0806465-2917902 1
Cxxxx 252589706005-2022820755015 1
302-1196139-4993924 1
303-5646700-0910740 1
304-1917413-4299541 1
305-6425830-6893162 1
306-6204117-1493102 1
Dxxxx 028-7321508-2210762 1
但我实际上是在寻找每个供应商。
m_supplier_name m_order_number
Axxxx 028-8286652-7920300 3
303-2594460-9759543
303-8120802-8687554
Bxxxx 302-0806465-2917902 1
Cxxxx 252589706005-2022820755015 6
302-1196139-4993924
303-5646700-0910740
304-1917413-4299541
305-6425830-6893162
306-6204117-1493102
Dxxxx 028-7321508-2210762 1
目标是将m_order_numbers的计数存储在原始数据帧中每行的新列中。
m_supplier_name m_eshop_order_number m_orders_per_supplier
1 Axxxx 202-0147880-7096352 1
2 Bxxxx 304-3926447-6272314 4
3 Bxxxx 306-6312344-4699555 4
4 Cxxxx 305-0612485-9987518 3
5 Cxxxx 306-6591383-8935534 3
6 Cxxxx 261620209654-1987667522016 3
7 Bxxxx 306-3078039-1083529 4
8 Bxxxx 305-6383385-7949949 4
9 Dxxxx P1824098 1
… …
答案 0 :(得分:0)
您可以使用transform
:
mdata['m_orders_per_supplier'] = mdata.groupby('m_supplier_name')['m_supplier_name']
.transform('size')
print (mdata)
m_supplier_name m_eshop_order_number m_orders_per_supplier
1 Axxxx 202-0147880-7096352 1
2 Bxxxx 304-3926447-6272314 4
3 Bxxxx 306-6312344-4699555 4
4 Cxxxx 305-0612485-9987518 3
5 Cxxxx 306-6591383-8935534 3
6 Cxxxx 261620209654-1987667522016 3
7 Bxxxx 306-3078039-1083529 4
8 Bxxxx 305-6383385-7949949 4
9 Dxxxx P1824098 1