我有一个数据框df
(可以下载here),参考公司名单,如下所示:
Provider.ID Local.Authority month year entry exit total
1 1-102642676 Warwickshire 10 2010 2 0 2
2 1-102642676 Bury 10 2010 1 0 1
3 1-102642676 Kent 10 2010 1 0 1
4 1-102642676 Essex 10 2010 1 0 1
5 1-102642676 Lambeth 10 2010 2 0 2
6 1-102642676 East Sussex 10 2010 5 0 5
7 1-102642676 Bristol, City of 10 2010 1 0 1
8 1-102642676 Liverpool 10 2010 1 0 1
9 1-102642676 Merton 10 2010 1 0 1
10 1-102642676 Cheshire East 10 2010 2 0 2
11 1-102642676 Knowsley 10 2010 1 0 1
12 1-102642676 North Yorkshire 10 2010 1 0 1
13 1-102642676 Kingston upon Thames 10 2010 1 0 1
14 1-102642676 Lewisham 10 2010 1 0 1
15 1-102642676 Wiltshire 10 2010 1 0 1
16 1-102642676 Hampshire 10 2010 1 0 1
17 1-102642676 Wandsworth 10 2010 1 0 1
18 1-102642676 Brent 10 2010 1 0 1
19 1-102642676 West Sussex 10 2010 1 0 1
20 1-102642676 Windsor and Maidenhead 10 2010 1 0 1
21 1-102642676 Luton 10 2010 1 0 1
22 1-102642676 Enfield 10 2010 1 0 1
23 1-102642676 Somerset 10 2010 1 0 1
24 1-102642676 Cambridgeshire 10 2010 1 0 1
25 1-102642676 Hillingdon 10 2010 1 0 1
26 1-102642676 Havering 10 2010 1 0 1
27 1-102642676 Solihull 10 2010 1 0 1
28 1-102642676 Bexley 10 2010 1 0 1
29 1-102642676 Sandwell 10 2010 1 0 1
30 1-102642676 Southampton 10 2010 1 0 1
31 1-102642676 Trafford 10 2010 1 0 1
32 1-102642676 Newham 10 2010 1 0 1
33 1-102642676 West Berkshire 10 2010 1 0 1
34 1-102642676 Reading 10 2010 1 0 1
35 1-102642676 Hartlepool 10 2010 1 0 1
36 1-102642676 Hampshire 3 2011 1 0 1
37 1-102642676 Kent 9 2011 0 1 -1
38 1-102642676 North Yorkshire 12 2011 0 1 -1
39 1-102642676 North Somerset 12 2012 2 0 2
40 1-102642676 Kent 10 2014 1 0 1
41 1-102642676 Somerset 1 2016 0 1 -1
我的目标是创建一个变量,该变量反映每个total
和每个Local.Authority
的最后一个变量(year
)的累积总和。 total
只是entry
和exit
之间的差异。我试图通过以下基础应用dplyr
来执行此操作:
library(dplyr)
df.1 = df %>% group_by(Local.Authority, year) %>%
mutate(cum.total = cumsum(total)) %>%
arrange(year, month, Local.Authority)
产生以下(错误)结果:
> df.1
Source: local data frame [41 x 8]
Groups: Local.Authority, year [41]
Provider.ID Local.Authority month year entry exit total cum.total
<fctr> <fctr> <int> <int> <int> <int> <int> <int>
1 1-102642676 Bexley 10 2010 1 0 1 35
2 1-102642676 Brent 10 2010 1 0 1 25
3 1-102642676 Bristol, City of 10 2010 1 0 1 13
4 1-102642676 Bury 10 2010 1 0 1 3
5 1-102642676 Cambridgeshire 10 2010 1 0 1 31
6 1-102642676 Cheshire East 10 2010 2 0 2 17
7 1-102642676 East Sussex 10 2010 5 0 5 12
8 1-102642676 Enfield 10 2010 1 0 1 29
9 1-102642676 Essex 10 2010 1 0 1 5
10 1-102642676 Hampshire 10 2010 1 0 1 23
.. ... ... ... ... ... ... ... ...
我通过检查不同年份出现的变量Local.Authority
中的级别(例如Kent)确认了这些结果:
> check = df.1 %>% filter(Local.Authority == "Kent")
> check
Source: local data frame [3 x 8]
Groups: Local.Authority, year [3]
Provider.ID Local.Authority month year entry exit total cum.total
<fctr> <fctr> <int> <int> <int> <int> <int> <int>
1 1-102642676 Kent 10 2010 1 0 1 4
2 1-102642676 Kent 9 2011 0 1 -1 42
3 1-102642676 Kent 10 2014 1 0 1 44
它应该在哪里:
Provider.ID Local.Authority month year entry exit total cum.total
<fctr> <fctr> <int> <int> <int> <int> <int> <int>
1 1-102642676 Kent 10 2010 1 0 1 1
2 1-102642676 Kent 9 2011 0 1 -1 0
3 1-102642676 Kent 10 2014 1 0 1 1
有人知道将这些结果从cumsum中取出可能会发生什么吗?非常感谢提前。
答案 0 :(得分:4)
当你按地方分组时.Authority&amp;年它需要唯一的值并打印结果为1,-1,1所以更好的组只有local.Authority,其中cumsum工作基于总值和结果1,0,1
df <- df %>%
group_by(Local.Authority) %>%
mutate(cum.to = cumsum(total))
> df
Source: local data frame [3 x 8]
Groups: Local.Authority [1]
Provider.ID Local.Authority month year entry exit total cum.to
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1-102642676 Kent 10 2010 1 0 1 1
2 1-102642676 Kent 9 2011 0 1 -1 0
3 1-102642676 Kent 10 2014 1 0 1 1
答案 1 :(得分:0)
我得到了解决问题的方法。我重新启动了我的会话,我只是通过Local Authority得到了我的结果分组,然后安排:
> df.1 = df %>% group_by(Local.Authority) %>%
+ mutate(cum.total = cumsum(total)) %>%
+ arrange(year, month, Local.Authority)
> df.1
Source: local data frame [41 x 8]
Groups: Local.Authority [36]
Provider.ID Local.Authority month year entry exit total cum.total
<fctr> <fctr> <int> <int> <int> <int> <int> <int>
1 1-102642676 Bexley 10 2010 1 0 1 1
2 1-102642676 Brent 10 2010 1 0 1 1
3 1-102642676 Bristol, City of 10 2010 1 0 1 1
4 1-102642676 Bury 10 2010 1 0 1 1
5 1-102642676 Cambridgeshire 10 2010 1 0 1 1
6 1-102642676 Cheshire East 10 2010 2 0 2 2
7 1-102642676 East Sussex 10 2010 5 0 5 5
8 1-102642676 Enfield 10 2010 1 0 1 1
9 1-102642676 Essex 10 2010 1 0 1 1
10 1-102642676 Hampshire 10 2010 1 0 1 1
现在检查“肯特”会产生预期结果:
> check = df.1 %>% filter(Local.Authority == "Kent")
> check
Source: local data frame [3 x 8]
Groups: Local.Authority [1]
Provider.ID Local.Authority month year entry exit total cum.total
<fctr> <fctr> <int> <int> <int> <int> <int> <int>
1 1-102642676 Kent 10 2010 1 0 1 1
2 1-102642676 Kent 9 2011 0 1 -1 0
3 1-102642676 Kent 10 2014 1 0 1 1