注意:此帖子与其他帖子(Recoding longitudinal variables in R)
相关我有一个data.table
,我想重新编码一个变量(homes
)的值,这些值取决于其他变量的值。基本上我希望与month == "1"
和year == "2011"
对应的观察值为month == "12"
和year == "2010"
中的值。 data.table
看起来像这样:
> head(test,25)
LA month year entry exit total homes
1: Barking and Dagenham 10 2010 2 0 2 NA
2: Barking and Dagenham 11 2010 3 0 3 NA
3: Barking and Dagenham 12 2010 3 0 3 15
4: Barking and Dagenham 1 2011 6 0 6 NA
5: Barking and Dagenham 2 2011 1 0 1 NA
6: Barking and Dagenham 3 2011 2 0 2 NA
7: Barking and Dagenham 4 2011 1 0 1 NA
8: Barking and Dagenham 10 2011 1 0 1 NA
9: Barking and Dagenham 11 2011 1 0 1 NA
10: Barking and Dagenham 1 2012 1 0 1 NA
11: Barking and Dagenham 9 2012 1 0 1 NA
12: Barking and Dagenham 6 2013 2 0 2 NA
13: Barking and Dagenham 1 2014 0 1 -1 NA
14: Barking and Dagenham 12 2014 0 1 -1 NA
15: Barking and Dagenham 3 2015 1 1 0 NA
16: Barking and Dagenham 11 2015 1 1 0 NA
17: Barking and Dagenham 12 2015 1 0 1 NA
18: Barnet 11 2010 24 0 24 NA
19: Barnet 12 2010 28 0 28 86
20: Barnet 1 2011 28 0 28 NA
21: Barnet 2 2011 6 0 6 NA
22: Barnet 3 2011 1 0 1 NA
23: Barnet 4 2011 1 0 1 NA
24: Barnet 7 2011 2 0 2 NA
25: Barnet 8 2011 1 0 1 NA
LA month year entry exit total homes
此data.table
的结构如下:
Classes ‘data.table’ and 'data.frame': 4664 obs. of 7 variables:
$ LA : Factor w/ 151 levels "Barking and Dagenham",..: 1 1 1 1 1 1 1 1 1 1 ...
$ month: int 10 11 12 1 2 3 4 10 11 1 ...
$ year : int 2010 2010 2010 2011 2011 2011 2011 2011 2011 2012 ...
$ entry: int 2 3 3 6 1 2 1 1 1 1 ...
$ exit : int 0 0 0 0 0 0 0 0 0 0 ...
$ total: int 2 3 3 6 1 2 1 1 1 1 ...
$ homes: int NA NA 15 NA NA NA NA NA NA NA ...
为了重新编码homes
,我创建了一个新变量homes.1
。我使用以下data.table
函数:
test = test[year== "2011" & month == "1", homes.1 := as.numeric(!is.na(homes)), by = LA]
test[, homes.1 := ifelse(!is.na(homes.1),
test[month == "12" & year == "2010",homes],
homes.1), by=LA]
我部分得到了我想要的东西。变量homes.1被重新编码但具有与对应于变量家庭的值不同的值。 test
的前25个观察结果:
LA month year entry exit total homes homes.1
1: Barking and Dagenham 10 2010 2 0 2 NA NA
2: Barking and Dagenham 11 2010 3 0 3 NA NA
3: Barking and Dagenham 12 2010 3 0 3 15 NA
4: Barking and Dagenham 1 2011 6 0 6 NA 46
5: Barking and Dagenham 2 2011 1 0 1 NA NA
6: Barking and Dagenham 3 2011 2 0 2 NA NA
7: Barking and Dagenham 4 2011 1 0 1 NA NA
8: Barking and Dagenham 10 2011 1 0 1 NA NA
9: Barking and Dagenham 11 2011 1 0 1 NA NA
10: Barking and Dagenham 1 2012 1 0 1 NA NA
11: Barking and Dagenham 9 2012 1 0 1 NA NA
12: Barking and Dagenham 6 2013 2 0 2 NA NA
13: Barking and Dagenham 1 2014 0 1 -1 NA NA
14: Barking and Dagenham 12 2014 0 1 -1 NA NA
15: Barking and Dagenham 3 2015 1 1 0 NA NA
16: Barking and Dagenham 11 2015 1 1 0 NA NA
17: Barking and Dagenham 12 2015 1 0 1 NA NA
18: Barnet 11 2010 24 0 24 NA NA
19: Barnet 12 2010 28 0 28 86 NA
20: Barnet 1 2011 28 0 28 NA 55
21: Barnet 2 2011 6 0 6 NA NA
22: Barnet 3 2011 1 0 1 NA NA
23: Barnet 4 2011 1 0 1 NA NA
24: Barnet 7 2011 2 0 2 NA NA
25: Barnet 8 2011 1 0 1 NA NA
LA month year entry exit total homes homes.1
同样,test
with homes.1的结构是:
> str(test)
Classes ‘data.table’ and 'data.frame': 4664 obs. of 8 variables:
$ LA : Factor w/ 151 levels "Barking and Dagenham",..: 1 1 1 1 1 1 1 1 1 1 ...
$ month : int 10 11 12 1 2 3 4 10 11 1 ...
$ year : int 2010 2010 2010 2011 2011 2011 2011 2011 2011 2012 ...
$ entry : int 2 3 3 6 1 2 1 1 1 1 ...
$ exit : int 0 0 0 0 0 0 0 0 0 0 ...
$ total : int 2 3 3 6 1 2 1 1 1 1 ...
$ homes : int NA NA 15 NA NA NA NA NA NA NA ...
$ homes.1: num NA NA NA 46 NA NA NA NA NA NA ...
- attr(*, ".internal.selfref")=<externalptr>
我想知道为什么根据homes.1
值不能正确重新编码homes
的观察结果。预期输出应如下所示:
LA month year entry exit total homes homes.1
1: Barking and Dagenham 10 2010 2 0 2 NA NA
2: Barking and Dagenham 11 2010 3 0 3 NA NA
3: Barking and Dagenham 12 2010 3 0 3 15 NA
4: Barking and Dagenham 1 2011 6 0 6 NA 15
5: Barking and Dagenham 2 2011 1 0 1 NA NA
6: Barking and Dagenham 3 2011 2 0 2 NA NA
7: Barking and Dagenham 4 2011 1 0 1 NA NA
8: Barking and Dagenham 10 2011 1 0 1 NA NA
9: Barking and Dagenham 11 2011 1 0 1 NA NA
10: Barking and Dagenham 1 2012 1 0 1 NA NA
11: Barking and Dagenham 9 2012 1 0 1 NA NA
12: Barking and Dagenham 6 2013 2 0 2 NA NA
13: Barking and Dagenham 1 2014 0 1 -1 NA NA
14: Barking and Dagenham 12 2014 0 1 -1 NA NA
15: Barking and Dagenham 3 2015 1 1 0 NA NA
16: Barking and Dagenham 11 2015 1 1 0 NA NA
17: Barking and Dagenham 12 2015 1 0 1 NA NA
18: Barnet 11 2010 24 0 24 NA NA
19: Barnet 12 2010 28 0 28 86 NA
20: Barnet 1 2011 28 0 28 NA 86
21: Barnet 2 2011 6 0 6 NA NA
22: Barnet 3 2011 1 0 1 NA NA
23: Barnet 4 2011 1 0 1 NA NA
24: Barnet 7 2011 2 0 2 NA NA
25: Barnet 8 2011 1 0 1 NA NA
LA month year entry exit total homes homes.1
答案 0 :(得分:1)
你得0,因为year== "2011" & month == "1"
表达式is.na(homes)
返回TRUE
的行,所以当你!is.na(homes)
时,它会返回FALSE
。
然后,由于您将homes.1
变量强制逻辑值创建为数字,因此会自动将TRUE
转换为1
,将FALSE
转换为0
。
然而,这不一定是个问题。您可以将这些0
值替换为另一个看起来像这样的语句:
test[, homes.1 := ifelse(!is.na(homes.1),
test[month == "12" & year == "2010",homes],
homes.1),
by=LA]
你能告诉我这是否有效吗?如果是,那么您可以简单地合并两列。