R data.table:重新编码变量时输出变为不同的值

时间:2016-06-16 14:18:13

标签: r data.table

注意:此帖子与其他帖子(Recoding longitudinal variables in R

相关

我有一个data.table,我想重新编码一个变量(homes)的值,这些值取决于其他变量的值。基本上我希望与month == "1"year == "2011"对应的观察值为month == "12"year == "2010"中的值。 data.table看起来像这样:

  > head(test,25)
                      LA month year entry exit total homes
 1: Barking and Dagenham    10 2010     2    0     2    NA
 2: Barking and Dagenham    11 2010     3    0     3    NA
 3: Barking and Dagenham    12 2010     3    0     3    15
 4: Barking and Dagenham     1 2011     6    0     6    NA
 5: Barking and Dagenham     2 2011     1    0     1    NA
 6: Barking and Dagenham     3 2011     2    0     2    NA
 7: Barking and Dagenham     4 2011     1    0     1    NA
 8: Barking and Dagenham    10 2011     1    0     1    NA
 9: Barking and Dagenham    11 2011     1    0     1    NA
10: Barking and Dagenham     1 2012     1    0     1    NA
11: Barking and Dagenham     9 2012     1    0     1    NA
12: Barking and Dagenham     6 2013     2    0     2    NA
13: Barking and Dagenham     1 2014     0    1    -1    NA
14: Barking and Dagenham    12 2014     0    1    -1    NA
15: Barking and Dagenham     3 2015     1    1     0    NA
16: Barking and Dagenham    11 2015     1    1     0    NA
17: Barking and Dagenham    12 2015     1    0     1    NA
18:               Barnet    11 2010    24    0    24    NA
19:               Barnet    12 2010    28    0    28    86
20:               Barnet     1 2011    28    0    28    NA
21:               Barnet     2 2011     6    0     6    NA
22:               Barnet     3 2011     1    0     1    NA
23:               Barnet     4 2011     1    0     1    NA
24:               Barnet     7 2011     2    0     2    NA
25:               Barnet     8 2011     1    0     1    NA
                      LA month year entry exit total homes

data.table的结构如下:

Classes ‘data.table’ and 'data.frame':  4664 obs. of  7 variables:
 $ LA   : Factor w/ 151 levels "Barking and Dagenham",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ month: int  10 11 12 1 2 3 4 10 11 1 ...
 $ year : int  2010 2010 2010 2011 2011 2011 2011 2011 2011 2012 ...
 $ entry: int  2 3 3 6 1 2 1 1 1 1 ...
 $ exit : int  0 0 0 0 0 0 0 0 0 0 ...
 $ total: int  2 3 3 6 1 2 1 1 1 1 ...
 $ homes: int  NA NA 15 NA NA NA NA NA NA NA ...

为了重新编码homes,我创建了一个新变量homes.1。我使用以下data.table函数:

 test = test[year== "2011" & month == "1", homes.1 := as.numeric(!is.na(homes)), by = LA]

test[, homes.1 := ifelse(!is.na(homes.1),
                         test[month == "12" & year == "2010",homes],
                         homes.1), by=LA]

我部分得到了我想要的东西。变量homes.1被重新编码但具有与对应于变量家庭的值不同的值。 test的前25个观察结果:

LA month year entry exit total homes homes.1
 1: Barking and Dagenham    10 2010     2    0     2    NA      NA
 2: Barking and Dagenham    11 2010     3    0     3    NA      NA
 3: Barking and Dagenham    12 2010     3    0     3    15      NA
 4: Barking and Dagenham     1 2011     6    0     6    NA      46
 5: Barking and Dagenham     2 2011     1    0     1    NA      NA
 6: Barking and Dagenham     3 2011     2    0     2    NA      NA
 7: Barking and Dagenham     4 2011     1    0     1    NA      NA
 8: Barking and Dagenham    10 2011     1    0     1    NA      NA
 9: Barking and Dagenham    11 2011     1    0     1    NA      NA
10: Barking and Dagenham     1 2012     1    0     1    NA      NA
11: Barking and Dagenham     9 2012     1    0     1    NA      NA
12: Barking and Dagenham     6 2013     2    0     2    NA      NA
13: Barking and Dagenham     1 2014     0    1    -1    NA      NA
14: Barking and Dagenham    12 2014     0    1    -1    NA      NA
15: Barking and Dagenham     3 2015     1    1     0    NA      NA
16: Barking and Dagenham    11 2015     1    1     0    NA      NA
17: Barking and Dagenham    12 2015     1    0     1    NA      NA
18:               Barnet    11 2010    24    0    24    NA      NA
19:               Barnet    12 2010    28    0    28    86      NA
20:               Barnet     1 2011    28    0    28    NA      55
21:               Barnet     2 2011     6    0     6    NA      NA
22:               Barnet     3 2011     1    0     1    NA      NA
23:               Barnet     4 2011     1    0     1    NA      NA
24:               Barnet     7 2011     2    0     2    NA      NA
25:               Barnet     8 2011     1    0     1    NA      NA
                      LA month year entry exit total homes homes.1

同样,test with homes.1的结构是:

> str(test)
Classes ‘data.table’ and 'data.frame':  4664 obs. of  8 variables:
 $ LA     : Factor w/ 151 levels "Barking and Dagenham",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ month  : int  10 11 12 1 2 3 4 10 11 1 ...
 $ year   : int  2010 2010 2010 2011 2011 2011 2011 2011 2011 2012 ...
 $ entry  : int  2 3 3 6 1 2 1 1 1 1 ...
 $ exit   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ total  : int  2 3 3 6 1 2 1 1 1 1 ...
 $ homes  : int  NA NA 15 NA NA NA NA NA NA NA ...
 $ homes.1: num  NA NA NA 46 NA NA NA NA NA NA ...
 - attr(*, ".internal.selfref")=<externalptr> 

我想知道为什么根据homes.1值不能正确重新编码homes的观察结果。预期输出应如下所示:

       LA month year entry exit total homes homes.1
 1: Barking and Dagenham    10 2010     2    0     2    NA      NA
 2: Barking and Dagenham    11 2010     3    0     3    NA      NA
 3: Barking and Dagenham    12 2010     3    0     3    15      NA
 4: Barking and Dagenham     1 2011     6    0     6    NA      15
 5: Barking and Dagenham     2 2011     1    0     1    NA      NA
 6: Barking and Dagenham     3 2011     2    0     2    NA      NA
 7: Barking and Dagenham     4 2011     1    0     1    NA      NA
 8: Barking and Dagenham    10 2011     1    0     1    NA      NA
 9: Barking and Dagenham    11 2011     1    0     1    NA      NA
10: Barking and Dagenham     1 2012     1    0     1    NA      NA
11: Barking and Dagenham     9 2012     1    0     1    NA      NA
12: Barking and Dagenham     6 2013     2    0     2    NA      NA
13: Barking and Dagenham     1 2014     0    1    -1    NA      NA
14: Barking and Dagenham    12 2014     0    1    -1    NA      NA
15: Barking and Dagenham     3 2015     1    1     0    NA      NA
16: Barking and Dagenham    11 2015     1    1     0    NA      NA
17: Barking and Dagenham    12 2015     1    0     1    NA      NA
18:               Barnet    11 2010    24    0    24    NA      NA
19:               Barnet    12 2010    28    0    28    86      NA
20:               Barnet     1 2011    28    0    28    NA      86
21:               Barnet     2 2011     6    0     6    NA      NA
22:               Barnet     3 2011     1    0     1    NA      NA
23:               Barnet     4 2011     1    0     1    NA      NA
24:               Barnet     7 2011     2    0     2    NA      NA
25:               Barnet     8 2011     1    0     1    NA      NA
                      LA month year entry exit total homes homes.1

1 个答案:

答案 0 :(得分:1)

你得0,因为year== "2011" & month == "1"表达式is.na(homes)返回TRUE的行,所以当你!is.na(homes)时,它会返回FALSE
然后,由于您将homes.1变量强制逻辑值创建为数字,因此会自动将TRUE转换为1,将FALSE转换为0

然而,这不一定是个问题。您可以将这些0值替换为另一个看起来像这样的语句:

test[, homes.1 := ifelse(!is.na(homes.1),
                  test[month == "12" & year == "2010",homes],
                  homes.1),
       by=LA]

你能告诉我这是否有效吗?如果是,那么您可以简单地合并两列。