我有一个常量矢量年,一个data.frame。
> year <- c("2011","2012","2013","2014")
[1] "2011" "2012" "2013" "2014"
> data.frame
id nome year value
01 bob 2011 100
02 alice 2012 210
03 alice 2014 300
04 ted 2013 110
05 ted 2011 240
然后我想将这个data.frame与矢量值混合......
> new data.frame
id nome year value
01 bob 2011 100
02 bob 2012 0
03 bob 2013 0
04 bob 2014 0
05 alice 2011 0
06 alice 2012 210
07 alice 2013 0
08 alice 2014 300
09 ted 2011 240
10 ted 2012 0
11 ted 2013 110
12 ted 2014 0
如果您不在城市的特定行年份在线,则应将缺少的年份插入表格中,重复除最后一个总是为零(0)的所有其他字段。
我真正的问题是......
我有这个结构和行的临时(数据帧):
siaf uf estado municipio dtPub vlLiberado
1 643 AC ACRE ACRELANDIA 2011 873905.4
2 643 AC ACRE ACRELANDIA 2012 250000.0
3 643 AC ACRE ACRELANDIA 2013 311776.0
4 157 AC ACRE ASSIS BRASIL 2011 0.0
5 157 AC ACRE ASSIS BRASIL 2013 350000.0
6 157 AC ACRE ASSIS BRASIL 2014 200000.0
7 105 AC ACRE BRASILEIA 2011 1118075.7
8 105 AC ACRE BRASILEIA 2012 4277180.7
9 105 AC ACRE BRASILEIA 2013 4669340.0
10 105 AC ACRE BRASILEIA 2014 775000.0
11 645 AC ACRE BUJARI 2011 0.0
12 645 AC ACRE BUJARI 2013 300000.0
13 645 AC ACRE BUJARI 2014 823000.0
14 647 AC ACRE CAPIXABA 2011 317600.0
15 647 AC ACRE CAPIXABA 2012 300000.0
16 647 AC ACRE CAPIXABA 2013 146250.0
17 107 AC ACRE CRUZEIRO DO SUL 2011 10563673.3
18 107 AC ACRE CRUZEIRO DO SUL 2012 11187626.9
19 107 AC ACRE CRUZEIRO DO SUL 2013 7976846.8
20 107 AC ACRE CRUZEIRO DO SUL 2014 2203750.0
我需要运行统计时间序列,为此,我在data.frame temp中包含更多行,以便城市在dtPub字段中记录2011年,2012年,2013年和2014年的记录,并且包含的金额为零(0) 。 我需要以下结果...
siaf uf estado municipio dtPub vlLiberado
1 643 AC ACRE ACRELANDIA 2011 873905.4
2 643 AC ACRE ACRELANDIA 2012 250000.0
3 643 AC ACRE ACRELANDIA 2013 311776.0
643 AC ACRE ACRELANDIA 2014 0.0 << code add this line
4 157 AC ACRE ASSIS BRASIL 2011 0.0
157 AC ACRE ASSIS BRASIL 2012 0.0 << code add this line
5 157 AC ACRE ASSIS BRASIL 2013 350000.0
6 157 AC ACRE ASSIS BRASIL 2014 200000.0
7 105 AC ACRE BRASILEIA 2011 1118075.7
8 105 AC ACRE BRASILEIA 2012 4277180.7
9 105 AC ACRE BRASILEIA 2013 4669340.0
10 105 AC ACRE BRASILEIA 2014 775000.0
11 645 AC ACRE BUJARI 2011 0.0
11 645 AC ACRE BUJARI 2012 0.0 << code add this line
12 645 AC ACRE BUJARI 2013 300000.0
13 645 AC ACRE BUJARI 2014 823000.0
14 647 AC ACRE CAPIXABA 2011 317600.0
15 647 AC ACRE CAPIXABA 2012 300000.0
16 647 AC ACRE CAPIXABA 2013 146250.0
16 647 AC ACRE CAPIXABA 2014 0.0 << code add this line
17 107 AC ACRE CRUZEIRO DO SUL 2011 10563673.3
18 107 AC ACRE CRUZEIRO DO SUL 2012 11187626.9
19 107 AC ACRE CRUZEIRO DO SUL 2013 7976846.8
20 107 AC ACRE CRUZEIRO DO SUL 2014 2203750.0
答案 0 :(得分:2)
新的tidyr
更新有一个方便的功能来做到这一点。有关更多信息,请查看?complete
:
library(tidyr)
library(dplyr)
df %>% complete(nome, year, fill=list(value=0)) %>%
mutate(id=sprintf("%02d", 1:n()))
# Source: local data frame [12 x 4]
#
# nome year id value
# (fctr) (int) (chr) (dbl)
# 1 alice 2011 01 0
# 2 alice 2012 02 210
# 3 alice 2013 03 0
# 4 alice 2014 04 300
# 5 bob 2011 05 100
# 6 bob 2012 06 0
# 7 bob 2013 07 0
# 8 bob 2014 08 0
# 9 ted 2011 09 240
# 10 ted 2012 10 0
# 11 ted 2013 11 110
# 12 ted 2014 12 0
使用新数据:
library(tidyr)
complete(df, c(siaf,uf, estado, municipio), dtPub, fill=list(vlLiberado=0))
Source: local data frame [20 x 6]
siaf uf estado municipio dtPub vlLiberado
(int) (fctr) (fctr) (fctr) (int) (dbl)
1 105 AC ACRE BRASILEIA 2011 1118075.7
2 105 AC ACRE BRASILEIA 2012 4277180.7
3 105 AC ACRE BRASILEIA 2013 4669340.0
4 105 AC ACRE BRASILEIA 2014 775000.0
5 157 AC ACRE ASSISBRASIL 2011 0.0
6 157 AC ACRE ASSISBRASIL 2012 0.0
7 157 AC ACRE ASSISBRASIL 2013 350000.0
8 157 AC ACRE ASSISBRASIL 2014 200000.0
9 643 AC ACRE ACRELANDIA 2011 873905.4
10 643 AC ACRE ACRELANDIA 2012 250000.0
11 643 AC ACRE ACRELANDIA 2013 311776.0
12 643 AC ACRE ACRELANDIA 2014 0.0
13 645 AC ACRE BUJARI 2011 0.0
14 645 AC ACRE BUJARI 2012 0.0
15 645 AC ACRE BUJARI 2013 300000.0
16 645 AC ACRE BUJARI 2014 823000.0
17 647 AC ACRE CAPIXABA 2011 317600.0
18 647 AC ACRE CAPIXABA 2012 300000.0
19 647 AC ACRE CAPIXABA 2013 146250.0
20 647 AC ACRE CAPIXABA 2014 0.0