Question

我有这个DataFrame（这只是一个例子，而不是真实的数据）：

In [1]: import pandas as pd
        my_data = [{'client_id' : '001', 'items' : '10', 'month' : 'Jan'},
                   {'client_id' : '001', 'items' : '20', 'month' : 'Feb'},
                   {'client_id' : '001', 'items' : '30', 'month' : 'Mar'},
                   {'client_id' : '002', 'items' : '30', 'month' : 'Jan'},
                   {'client_id' : '002', 'items' : '20', 'month' : 'Feb'},
                   {'client_id' : '002', 'items' : '15', 'month' : 'Mar'},
                   {'client_id' : '003', 'items' : '10', 'month' : 'Jan'},
                   {'client_id' : '003', 'items' : '20', 'month' : 'Feb'},
                   {'client_id' : '003', 'items' : '15', 'month' : 'Mar'}]
        df = pd.DataFrame(my_data)

In  [2]: df
Out [2]:    
            client_id   month        items
         0        001     Jan           10
         1        001     Feb           20
         2        001     Mar           30
         3        002     Jan           30
         4        002     Feb           20
         5        002     Mar           15
         6        003     Jan           10
         7        003     Feb           20
         8        003     Mar           15

我想要的是计算每对月份购买的增量项目。也就是说，例如，客户'001'在2月（20）购买了10个项目而不是1月（10）。客户'002'，买了-10件商品（2月20日，1月30日）。最终的DataFrame看起来像这样：

In [3]: delta_df
Out [3]:   
            client_id   delta_items_feb   delta_items_mar
        0         001                10                10
        1         002               -10                -5
        2         003                10                -5

有关如何做的任何想法？

Answer 1

这是一种方法，使用pivot_table首先按客户和月份对项目进行分组：

（我首先将items列投射到df.items = df.items.astype(int)）

的整数

>>> table = df.pivot_table(values='items', rows='client_id', cols='month')
>>> table = table[['Jan', 'Feb', 'Mar']]
>>> pd.DataFrame(np.diff(table.values), 
                 columns=['delta_items_feb', 'delta_items_mar'],
                 index=table.index).reset_index()

  client_id  delta_items_feb  delta_items_mar
0       001               10               10
1       002              -10               -5
2       003               10               -5

注意：在较新版本的pandas中，在创建数据透视表时使用index / columns代替rows / cols。

此：

按客户和日期对数据进行调整以显示每个
确保表中的列按月分类正确
使用np.diff计算连续月份之间的差异，并使用所需的列名称创建新的DataFrame

Answer 2

赞赏一个非常明确的问题。按客户分组并计算每个组的增量：

>>> df['deltas'] = df.groupby('client_id')\
                     .apply(lambda x: x['items'].astype(int).diff()).values

  client_id  items month  deltas
0       001     10   Jan     NaN
1       001     20   Feb      10
2       001     30   Mar      10
3       002     30   Jan     NaN
4       002     20   Feb     -10
5       002     15   Mar      -5
6       003     10   Jan     NaN
7       003     20   Feb      10
8       003     15   Mar      -5

最后将其带到您想要删除1月列的表单：

>>> df.pivot(index='client_id', columns='month', values='deltas')\
      .drop('Jan', axis=1)

month       Feb  Mar
client_id       
001         10  10
002        -10  -5
003         10  -5

Answer 3

不是花哨但是这对我有用

#change 'items' from string to int
## use loc to avoid "slice" warning
df.loc[:,"items"] = df["items"].map(int)

# use pivot to make columns for each unique value in "month" column
dfp = df.pivot('client_id','month','items')

# calculate delta and put in a new column 
dfp["dJF"] = dfp.Feb - dfp.Jan

给出

month     Feb Jan Mar  dJF
client_id                 
001        20  10  30   10
002        20  30  15  -10
003        20  10  15   10

Answer 4

1) clietn_id to  set. Set to list client_listand sorted  ['001','002','003'] .
2) month string to int Jan-1;Feb-2;Mar -3 and etc
3)  for client in client_listand:
    For every client create new list
    for line in you_date:
        When ides of clients coincide, add to the list #filter by client_id
     sorted result by month
     in the loop from data of one client generate the lines of outgoing table. 
     delta_items_mar = item[n]-item[n-1]

根据数据框中的值计算增量

4 个答案: