如何计算pandas组中的值

时间:2017-03-29 10:12:18

标签: python pandas

很抱歉这么愚蠢......我正在尝试聚合和统计pandas数据框中的数据

m_supplier_name            m_eshop_order_number
1   Axxxx                  202-0147880-7096352
2   Bxxxx                  304-3926447-6272314
3   Bxxxx                  306-6312344-4699555
4   Cxxxx                  305-0612485-9987518
5   Cxxxx                  306-6591383-8935534
6   Cxxxx                  261620209654-1987667522016
7   Bxxxx                  306-3078039-1083529
8   Bxxxx                  305-6383385-7949949
9   Dxxxx                  P1824098
    …   …

这是我的熊猫查询:

print mdata.groupby(['m_supplier_name','m_order_number']).size()

这给了我:

m_supplier_name       m_order_number
Axxxx                 028-8286652-7920300           1
                      303-2594460-9759543           1
                      303-8120802-8687554           1
Bxxxx                 302-0806465-2917902           1
Cxxxx                 252589706005-2022820755015    1
                      302-1196139-4993924           1
                      303-5646700-0910740           1
                      304-1917413-4299541           1
                      305-6425830-6893162           1
                      306-6204117-1493102           1
Dxxxx                 028-7321508-2210762           1

但我实际上是在寻找每个供应商。

m_supplier_name       m_order_number
Axxxx                 028-8286652-7920300           3
                      303-2594460-9759543           
                      303-8120802-8687554           
Bxxxx                 302-0806465-2917902           1
Cxxxx                 252589706005-2022820755015    6
                      302-1196139-4993924           
                      303-5646700-0910740           
                      304-1917413-4299541           
                      305-6425830-6893162           
                      306-6204117-1493102           
Dxxxx                 028-7321508-2210762           1

目标是将m_order_numbers的计数存储在原始数据帧中每行的新列中。

m_supplier_name         m_eshop_order_number        m_orders_per_supplier
1   Axxxx               202-0147880-7096352         1
2   Bxxxx               304-3926447-6272314         4
3   Bxxxx               306-6312344-4699555         4
4   Cxxxx               305-0612485-9987518         3
5   Cxxxx               306-6591383-8935534         3
6   Cxxxx               261620209654-1987667522016  3
7   Bxxxx               306-3078039-1083529         4
8   Bxxxx               305-6383385-7949949         4
9   Dxxxx               P1824098                    1
    …   …

1 个答案:

答案 0 :(得分:0)

您可以使用transform

mdata['m_orders_per_supplier'] = mdata.groupby('m_supplier_name')['m_supplier_name']
                                      .transform('size')
print (mdata)
  m_supplier_name        m_eshop_order_number  m_orders_per_supplier
1           Axxxx         202-0147880-7096352                      1
2           Bxxxx         304-3926447-6272314                      4
3           Bxxxx         306-6312344-4699555                      4
4           Cxxxx         305-0612485-9987518                      3
5           Cxxxx         306-6591383-8935534                      3
6           Cxxxx  261620209654-1987667522016                      3
7           Bxxxx         306-3078039-1083529                      4
8           Bxxxx         305-6383385-7949949                      4
9           Dxxxx                    P1824098                      1