Question

我想将聚合函数（sum（））应用于变量（＆＃34; PurchAmount＆＃34;），我首先按维度＆＃34; Customer＆＃34;聚合。同时我想选择＃34; Quantity＆＃34;列。

在R中，可以使用：

myData[, list(Quantity, AggPurch=sum(PurchAmount)), by=Customer]

Python中的Pandas DataFrame是否有类似的解决方案？

Answer 1

您可以使用'.groupby'将pandas拆分成组：

http://pandas.pydata.org/pandas-docs/stable/groupby.html#splitting-an-object-into-groups

import pandas as pd

raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 
    'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 
    'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
    'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
    'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}

df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])

df['preTestScore'].groupby([df['regiment'], df['company']]).mean()

df的输出为：

regiment    company
Dragoons    1st         3.5
            2nd        27.5
Nighthawks  1st        14.0
            2nd        16.5
Scouts      1st         2.5
            2nd         2.5
dtype: float64

示例来自： http://chrisalbon.com/python/pandas_apply_operations_to_groups.html

同时在Pandas DataFrame中进行分组/聚合和选择

1 个答案: