Question

我有一个像

这样的数据框

       customer        fruit    price
0      cust1           mango     30
1      cust2           apple     45
2      cust1           banana    55
3      cust3           mango     22
4      cust4           banana    54
5      cust3           apple     55
6      cust2           apple     90
7      cust1           mango     45
8      cust3           banana    45
9      cust2           mango     23
10     cust4           mango     44

我需要=每个顾客花多少钱购买芒果和其他水果（我不是芒果本身作为一个类别）数量.Eg。 cust1 mango = 75，cust1 other = 55，对每个客户都很明智。像

这样的东西

      customer   price spent_on_mango  spent_on_others
0      cust1          75                    55   
1      cust2          23                    135       
2      cust3          22                    100
3      cust4          44                    54

请建议。

Answer 1

我们可以将'fruit'中不是'mango'的元素替换为'others'，然后groupby将变量（'customer'，'fruit'）替换为sum和{ {1}}。

unstack

Answer 2

为什么不创建一个列来表明水果是芒果，然后将其包含在Job: [SimpleJob: [name=importJob]] launched with the following parameters: [{run.id=1}] Executing step: [step1] Encountered an error executing step step1 in job importJob Parsing error at line: 13834 in resource=[class path resource [datacons.csv]], input=[] Job: [SimpleJob: [name=importJob]] completed with the following parameters: [{run.id=1}] and the following status: [FAILED]中？

groupby

Answer 3

另一种pandas方法：

df.fruit[df.fruit != 'mango'] = 'other_fruit'
pd.pivot_table(df, 'price', 'customer', 'fruit', np.sum)

fruit     mango  other_fruit
customer                    
cust1        75           55
cust2        23          135
cust3        22          100
cust4        44           54

Answer 4

作为替代方案，您可以pivot_table：

执行此操作

In [11]: res = df.pivot_table("price", "customer", "fruit", fill_value=0)

In [12]: res
Out[12]:
fruit     apple  banana  mango
customer
cust1       0.0      55   37.5
cust2      67.5       0   23.0
cust3      55.0      45   22.0
cust4       0.0      54   44.0

这可能足够好，但你可以创造所需的＆＃34;非芒果＆＃34;：

In [13]: mango = res.pop("mango")

In [14]: res.sum(axis=1).to_frame(name="not mango").join(mango)
Out[14]:
          not mango  mango
customer
cust1          55.0   37.5
cust2          67.5   23.0
cust3         100.0   22.0
cust4          54.0   44.0

一般来说，如果你看到一个堆栈/取消堆栈，你应该尝试＆＃34; pivot＆＃34; ：）

Answer 5

尝试对某些列进行分组，然后像这样应用sum（）：

print dframe.groupby(["customer","fruit"]).sum()

就像命令所说的那样，它会对列进行分组并将值相加。

它返回一个dataFrame，其中包含您需要的信息。

数据帧的分组找到计数，列的总和

5 个答案: