在拆分data.table列时如何使用字符向量的R对象作为列名?

时间:2017-04-09 04:30:56

标签: r data.table strsplit

我是data.table的新手,我正在尝试学习它并尝试从data.frame转移到data.table。

现在,我正在尝试将文本拆分为新列,并且我正在讨论here

这就是我想要做的事情。

这是一个示例数据:

# sample data frame
test <- data.table(POS = c(254, 280, 303,  22, 105, 173, 230, 235, 257, 258),
               value = c("0/1:15:3:123:12:478:-38.8484,0,-6.94934",
                         "0/0:15:15:577:0:0:0,-4.51545,-52.25",
                         "0/0:13:13:276:0:0:0,-3.91339,-25.0455",
                         "0/0:367:347:13643:0:0:0,-104.457,-1226.73",
                         "0/0:367:344:13145:5,0,1,0:168,0,41,0:0,-89.9158,-1166.99,-103.554,-1168.49,-1182.1,-100.161,-1165.11,-1178.71,-1178.41,-103.554,-1168.49,-1182.1,-1178.71,-1182.1",
                         "0/1:344:180:5411:156:4394:-294.227,0,-385.695",
                         "0/0:352:349:12289:1:12:0,-104.28,-1104.15",
                         "0/0:352:345:10691:1:12:0,-103.081,-960.583",
                         "0/0:352:351:13162:1:41:0,-101.868,-1179.6",
                         "0/0:352:349:12593:0:0:0,-105.059,-1132.45"))  

我想使用&#34;将值拆分为不同的列:&#34;具有特定的列名称。下面的代码(我从上面的链接中学到的)完美地做了那个。

test[, c("GT", "DP", "RO", "QR", "AO", "QA", "GL") := tstrsplit(value, ":", 
fixed=TRUE)]

但是,是否可以使用R对象代替上面的c(名称)?像这样:

# new column names
namesForm <- c("GT", "DP", "RO", "QR", "AO", "QA", "GL")

然后,使用下面的namesForm:

# use the namesForm as column names
test[, namesForm := tstrsplit(value, ":", fixed=TRUE)]

这给了我警告和不同的输出(给我一个3个变量的data.table;最后一个10个列表,从tstrsplit输出中回收了7个列表)

Warning message:
In `[.data.table`(test, , `:=`(namesForm, tstrsplit(value, ":",  :
Supplied 7 items to be assigned to 10 items of column 'namesForm' (recycled leaving remainder of 3 items).

所以我的问题是,是否可以使用R对象/变量代替显式c()?

1 个答案:

答案 0 :(得分:2)

您可以使用(namesForm) :=代替namesForm :=

示例:

test2 <- copy(test)
namesForm <- c("GT", "DP", "RO", "QR", "AO", "QA", "GL")

str(test[, c("GT", "DP", "RO", "QR", "AO", "QA", "GL") := tstrsplit(value, ":", fixed=TRUE)])
# Classes ‘data.table’ and 'data.frame':    10 obs. of  9 variables:
#  $ POS  : num  254 280 303 22 105 173 230 235 257 258
#  $ value: chr  "0/1:15:3:123:12:478:-38.8484,0,-6.94934" "0/0:15:15:577:0:0:0,-4.51545,-52.25" "0/0:13:13:276:0:0:0,-3.91339,-25.0455" "0/0:367:347:13643:0:0:0,-104.457,-1226.73" ...
#  $ GT   : chr  "0/1" "0/0" "0/0" "0/0" ...
#  $ DP   : chr  "15" "15" "13" "367" ...
#  $ RO   : chr  "3" "15" "13" "347" ...
#  $ QR   : chr  "123" "577" "276" "13643" ...
#  $ AO   : chr  "12" "0" "0" "0" ...
#  $ QA   : chr  "478" "0" "0" "0" ...
#  $ GL   : chr  "-38.8484,0,-6.94934" "0,-4.51545,-52.25" "0,-3.91339,-25.0455" "0,-104.457,-1226.73" ...
#  - attr(*, ".internal.selfref")=<externalptr> 

str(test2[, (namesForm) := tstrsplit(value, ":", fixed=TRUE)])
# Classes ‘data.table’ and 'data.frame':    10 obs. of  9 variables:
#  $ POS  : num  254 280 303 22 105 173 230 235 257 258
#  $ value: chr  "0/1:15:3:123:12:478:-38.8484,0,-6.94934" "0/0:15:15:577:0:0:0,-4.51545,-52.25" "0/0:13:13:276:0:0:0,-3.91339,-25.0455" "0/0:367:347:13643:0:0:0,-104.457,-1226.73" ...
#  $ GT   : chr  "0/1" "0/0" "0/0" "0/0" ...
#  $ DP   : chr  "15" "15" "13" "367" ...
#  $ RO   : chr  "3" "15" "13" "347" ...
#  $ QR   : chr  "123" "577" "276" "13643" ...
#  $ AO   : chr  "12" "0" "0" "0" ...
#  $ QA   : chr  "478" "0" "0" "0" ...
#  $ GL   : chr  "-38.8484,0,-6.94934" "0,-4.51545,-52.25" "0,-3.91339,-25.0455" "0,-104.457,-1226.73" ...
#  - attr(*, ".internal.selfref")=<externalptr>