更改groupby中的列值

时间:2018-09-02 18:58:49

标签: python pandas group-by

这是一个数据框:

           id         value next_value      step
0       10764  612486.33921   800000.0       0.0
1       10764  612486.33921  1000000.0  200000.0
2       10822   97604.30520   100000.0       0.0
3       10823   97805.45851   100000.0       0.0
4       10823   97805.45851   165000.0   65000.0
5       10823   97805.45851   250000.0   85000.0
6       10823   97805.45851   400000.0  150000.0
7       10823   97805.45851   600000.0  200000.0
8       10823   97805.45851   800000.0  200000.0
9       10823   97805.45851  1000000.0  200000.0
10      10823   97805.45851  1200000.0  200000.0
11      10882   33843.33701    40000.0       0.0
12      10882   33843.33701    60000.0   20000.0
13      10882   33843.33701    90000.0   30000.0
14      10882   33843.33701   125000.0   35000.0
15      10882   33843.33701   150000.0   25000.0
16      10882   33843.33701   175000.0   25000.0
17      10882   33843.33701   205000.0   30000.0
18      10882   33843.33701   230000.0   25000.0
19      10882   33843.33701   300000.0   70000.0
20      10882   33843.33701   440000.0  140000.0
21      10758  241890.94525   385000.0       0.0
22      10817  138519.73560   150000.0       0.0
23      10817  138519.73560   200000.0   50000.0
24      10817  138519.73560   250000.0   50000.0
25      10817  138519.73560   300000.0   50000.0
26      10817  138519.73560   350000.0   50000.0
27      10817  138519.73560   400000.0   50000.0
28      10859   73140.30048    80000.0       0.0
29      10859   73140.30048   100000.0   20000.0

我需要将stepnext_valuevalue组中的第一个差异加到id上。

即对于id=10764 step = step + 800000.0 - 612486.33921

id=10822 step = step + 100000.0 - 97604.30520

我已经尝试过了:

df['step'] += df.groupby('id')[['next_value', 'value']].apply
    (lambda x: x.iloc[0, 0] - x.iloc[0, 1])

但这给了我所有等于step的{​​{1}}值。

我在这里可以做什么?

1 个答案:

答案 0 :(得分:2)

使用groupby id并获得first值。然后,对s使用df.step

In [717]: s = df.groupby('id').transform('first')

In [718]: df.step + s.next_value - s.value
Out[718]:
0     187513.66079
1     387513.66079
2       2395.69480
3       2194.54149
4      67194.54149
5      87194.54149
6     152194.54149
7     202194.54149
8     202194.54149
9     202194.54149
10    202194.54149
11      6156.66299
12     26156.66299
13     36156.66299
14     41156.66299
15     31156.66299
16     31156.66299
17     36156.66299
18     31156.66299
19     76156.66299
20    146156.66299
21    143109.05475
22     11480.26440
23     61480.26440
24     61480.26440
25     61480.26440
26     61480.26440
27     61480.26440
28      6859.69952
29     26859.69952
dtype: float64

详细信息

In [719]: s
Out[719]:
           value  next_value  step
0   612486.33921    800000.0   0.0
1   612486.33921    800000.0   0.0
2    97604.30520    100000.0   0.0
3    97805.45851    100000.0   0.0
4    97805.45851    100000.0   0.0
5    97805.45851    100000.0   0.0
6    97805.45851    100000.0   0.0
7    97805.45851    100000.0   0.0
8    97805.45851    100000.0   0.0
9    97805.45851    100000.0   0.0
10   97805.45851    100000.0   0.0
11   33843.33701     40000.0   0.0
12   33843.33701     40000.0   0.0
13   33843.33701     40000.0   0.0
14   33843.33701     40000.0   0.0
15   33843.33701     40000.0   0.0
16   33843.33701     40000.0   0.0
17   33843.33701     40000.0   0.0
18   33843.33701     40000.0   0.0
19   33843.33701     40000.0   0.0
20   33843.33701     40000.0   0.0
21  241890.94525    385000.0   0.0
22  138519.73560    150000.0   0.0
23  138519.73560    150000.0   0.0
24  138519.73560    150000.0   0.0
25  138519.73560    150000.0   0.0
26  138519.73560    150000.0   0.0
27  138519.73560    150000.0   0.0
28   73140.30048     80000.0   0.0
29   73140.30048     80000.0   0.0