组合包含NA的data.table列

时间:2015-11-17 18:47:29

标签: r data.table

我在数据表中有一组五列。

dt <- data.table(
  city = c(rep(1,2), rep(2,2), rep(3,2), rep(4,2)),
  neighborhoods.1 = c(NA, "a", "b", "c", NA, NA, "d", "e"),
  neighborhoods.2 = c(NA, "f", "g", rep(NA,5)),
  neighborhoods.3 = c(NA, "h", rep(NA, 6)),
  irrelevantdata = c(1:8)
)

   city neighborhoods.1 neighborhoods.2 neighborhoods.3 irrelevantdata
1:    1              NA              NA              NA              1
2:    1               a               f               h              2
3:    2               b               g              NA              3
4:    2               c              NA              NA              4
5:    3              NA              NA              NA              5
6:    3              NA              NA              NA              6
7:    4               d              NA              NA              7
8:    4               e              NA              NA              8

我想将前四列合并为一列。

   neighborhood
1:    1
2:    1-a-f-h
3:    2-b-g
4:    2-c
5:    3
6:    3
7:    4-d
8:    4-e

正如您所看到的,我正在删除NA条记录并与-分隔。

我试过这个,在处理j时遇到了明显的问题:

business[
    , 
    neighborhood = paste0(
      city,
      if(!is.na(neighborhoods.1)) paste0("-", neighborhoods.1),
      if(!is.na(neighborhoods.2)) paste0("-", neighborhoods.2),
      if(!is.na(neighborhoods.3)) paste0("-", neighborhoods.3),       
      ""
    )
]

我怎样才能完成这项工作?

更新以反映我不想要合并的其他列。

1 个答案:

答案 0 :(得分:5)

一个选项是paste使用do.call将行中的元素放在一起,然后删除NA元素以及输出向量中的额外-。< / p>

dt[,.(neighborhood = gsub('-NA|NA-', '', 
   do.call(paste, c(.SD, sep='-')))), .SDcols= city:neighborhoods.3]

或者另一个选项是按行序列分组,unlist Data.table的子集(.SD),删除NA元素(na.omit),paste元素在一起。我们可以在.SDcols

中指定要用于此操作的列
dt[, .(neighbourhood = paste(na.omit(unlist(.SD)),collapse='-')) , 
              by=1:nrow(dt), .SDcols= city:neighborhoods.3]

或@Frank建议的另一个选项是melt数据集的子集(由所需的列指定)以进行长格式,然后paste

 mycols <- setdiff(names(dt), 'irrelevantdata')
 na.omit(melt(dt[,mycols,with=FALSE][, r := .I], 
      id.var="r"))[, paste(value, collapse="-"), by=r]