我有一个纵向数据框prueba
,随着时间的推移,它遵循不同的单位(变量LA
)(变量time
和year
)。前25个观察结果具有以下结构。
> head(prueba, 25)
LA month year entry exit total homes
1 Barking and Dagenham 10 2010 2 0 2 NA
2 Barking and Dagenham 11 2010 3 0 3 NA
3 Barking and Dagenham 12 2010 3 0 3 15
4 Barking and Dagenham 1 2011 6 0 6 NA
5 Barking and Dagenham 2 2011 1 0 1 NA
6 Barking and Dagenham 3 2011 2 0 2 NA
7 Barking and Dagenham 4 2011 1 0 1 NA
8 Barking and Dagenham 10 2011 1 0 1 NA
9 Barking and Dagenham 11 2011 1 0 1 NA
10 Barking and Dagenham 1 2012 1 0 1 NA
11 Barking and Dagenham 9 2012 1 0 1 NA
12 Barking and Dagenham 6 2013 2 0 2 NA
13 Barking and Dagenham 1 2014 0 1 -1 NA
14 Barking and Dagenham 12 2014 0 1 -1 NA
15 Barking and Dagenham 3 2015 1 1 0 NA
16 Barking and Dagenham 11 2015 1 1 0 NA
17 Barking and Dagenham 12 2015 1 0 1 NA
18 Barnet 11 2010 24 0 24 NA
19 Barnet 12 2010 28 0 28 86
20 Barnet 1 2011 28 0 28 NA
21 Barnet 2 2011 6 0 6 NA
22 Barnet 3 2011 1 0 1 NA
23 Barnet 4 2011 1 0 1 NA
24 Barnet 7 2011 2 0 2 NA
25 Barnet 8 2011 1 0 1 NA
我的目标是通过为homes
和month == "2"
的观察值分配不缺少的值来重新编码year == "2011"
变量。如果没有对month
和year
的这些值进行观察,则重新标记的观察结果将是与month == "1"
和year == "2011"
对应的观察结果。理想情况下,预期的输出将是这样的:
> head(prueba, 25)
LA month year entry exit total homes
1 Barking and Dagenham 10 2010 2 0 2 NA
2 Barking and Dagenham 11 2010 3 0 3 NA
3 Barking and Dagenham 12 2010 3 0 3 NA
4 Barking and Dagenham 1 2011 6 0 6 NA
5 Barking and Dagenham 2 2011 1 0 1 15
6 Barking and Dagenham 3 2011 2 0 2 NA
7 Barking and Dagenham 4 2011 1 0 1 NA
8 Barking and Dagenham 10 2011 1 0 1 NA
9 Barking and Dagenham 11 2011 1 0 1 NA
10 Barking and Dagenham 1 2012 1 0 1 NA
11 Barking and Dagenham 9 2012 1 0 1 NA
12 Barking and Dagenham 6 2013 2 0 2 NA
13 Barking and Dagenham 1 2014 0 1 -1 NA
14 Barking and Dagenham 12 2014 0 1 -1 NA
15 Barking and Dagenham 3 2015 1 1 0 NA
16 Barking and Dagenham 11 2015 1 1 0 NA
17 Barking and Dagenham 12 2015 1 0 1 NA
18 Barnet 11 2010 24 0 24 NA
19 Barnet 12 2010 28 0 28 NA
20 Barnet 1 2011 28 0 28 NA
21 Barnet 2 2011 6 0 6 86
22 Barnet 3 2011 1 0 1 NA
23 Barnet 4 2011 1 0 1 NA
24 Barnet 7 2011 2 0 2 NA
25 Barnet 8 2011 1 0 1 NA
我已在以下基础上使用data.table
来解决此问题:
test = data.table(prueba)
setkey(test, LA)
test$homes =test[, .SD[, ifelse(year == "2011" & month == "2", !is.na(homes), homes)], by=LA]
但它没有产生预期的产出。
> head(test, 25)
LA month year entry exit total homes
1: Barking and Dagenham 10 2010 2 0 2 NA
2: Barking and Dagenham 11 2010 3 0 3 NA
3: Barking and Dagenham 12 2010 3 0 3 15
4: Barking and Dagenham 1 2011 6 0 6 NA
5: Barking and Dagenham 2 2011 1 0 1 NA
6: Barking and Dagenham 3 2011 2 0 2 NA
7: Barking and Dagenham 4 2011 1 0 1 NA
8: Barking and Dagenham 10 2011 1 0 1 NA
9: Barking and Dagenham 11 2011 1 0 1 NA
10: Barking and Dagenham 1 2012 1 0 1 NA
11: Barking and Dagenham 9 2012 1 0 1 NA
12: Barking and Dagenham 6 2013 2 0 2 NA
13: Barking and Dagenham 1 2014 0 1 -1 NA
14: Barking and Dagenham 12 2014 0 1 -1 NA
15: Barking and Dagenham 3 2015 1 1 0 NA
16: Barking and Dagenham 11 2015 1 1 0 NA
17: Barking and Dagenham 12 2015 1 0 1 NA
18: Barnet 11 2010 24 0 24 NA
19: Barnet 12 2010 28 0 28 86
20: Barnet 1 2011 28 0 28 NA
21: Barnet 2 2011 6 0 6 NA
22: Barnet 3 2011 1 0 1 NA
23: Barnet 4 2011 1 0 1 NA
24: Barnet 7 2011 2 0 2 NA
25: Barnet 8 2011 1 0 1 NA
LA month year entry exit total homes
如果有人可以提出替代方法,我将不胜感激 - 不一定是data.table
。
答案 0 :(得分:1)
library(dplyr)
dfs <- data.frame(df %>%
group_by(LA) %>%
summarise(Homes = sum(homes, na.rm = T)) %>%
inner_join(.,df, by = 'LA') %>%
mutate(Homes = ifelse(month == 2 & year == 2011, Homes, NA)))
这应该可以解决问题,并且使用dplyr
包具有很高的速度,而不是迭代地执行(例如for
或while
)。