Question

我有以下数据框：

   col1    col2  col3
0   tom     2    cash
1   tom     3    gas
2   tom     5    online
3   jerry   1    online
4   jerry   4    online
5   jerry   5    gas
6   scooby  8    cash
7   scooby  6    dogfood
8   scooby  1    cheese

容易获得：

data = {'col1': ['tom', 'tom', 'tom', 'jerry', 'jerry', 'jerry', 'scooby', 'scooby', 'scooby'],
'col2': [2,3,5,1,4,5,8,6,1],
'col3':['cash', 'gas', 'online', 'online', 'online', 'gas', 'cash', 'dogfood', 'cheese']}

pd.DataFrame(data)

如何将数据按col1分组，然后作为额外的列，为col3的指定值获取特定的汇总。

例如，假设我要按col1分组，并为gas中的每个人获取cash，online和col1字段的总和，就像这样。

col1    gas_sum    cash_sum    online_sum
tom        3          2             5
jerry      5          0             5
scooby     0          8             0

我对熊猫还比较陌生，我想到的唯一方法是对所有数据进行for循环，因为groupby的目的更多是为了提供类似列的总和/均值在我的示例中为col2。

任何帮助表示赞赏。

Answer 1

IIUC，

我们可以链接isin groupby和unstack

df1 = df.loc[df["col3"].isin(["gas", "online", "cash"])].groupby(["col1", "col3"])[
    "col2"
].sum().unstack().fillna(0)

df1.columns = df1.columns.map(lambda x : x + '_sum')

df1.columns.name = ''

print(df1)

        cash_sum  gas_sum  online_sum
col1                                 
jerry        0.0      5.0         5.0
scooby       8.0      0.0         0.0
tom          2.0      3.0         5.0

Answer 2

使用pivot_table的另一种方法。我们还将使用reindex仅获取您感兴趣的值，并使用add_suffix更改列名：

# Values to sum
values = ['cash', 'gas', 'online']

df_out = (df.pivot_table(index='col1', columns='col3',
                         values='col2', aggfunc='sum',
                         fill_value=0)
 .reindex(columns=values, fill_value=0)
 .add_suffix('_sum'))

[出]

col3    cash_sum  gas_sum  online_sum
col1                                 
jerry          0        5           5
scooby         8        0           0
tom            2        3           5

从熊猫中其他列的集合创建新列

2 个答案: