R使用特定功能将多行折叠成一行,并且使用特定功能。字符列

时间:2017-11-11 04:00:53

标签: r data.table

library(data.table)
library(lubridate)

x1 <- c(20090101, "2009-01-02", "2009 01 03", "2009-1-4",
       "2009-1, 5", "Created on 2009 1 6", "200901 !!! 07")

dt2 <- data.table(id = c(1,1,1,2,2,2,2), date1 = ymd(x1), charval = c("aa","vv","ss","a","b","c","d"))

   id      date1 charval
1:  1 2009-01-01      aa
2:  1 2009-01-02      vv
3:  1 2009-01-03      ss
4:  2 2009-01-04       a
5:  2 2009-01-05       b
6:  2 2009-01-06       c
7:  2 2009-01-07       d

我使用下一代码进行ID分组:

dt3 <- dt2[, Map(function(x,y) ifelse(x != "paste", get(x)(y, na.rm = TRUE), paste(y, sep = ";")), 
                              setNames(c("mean", "paste"), names(.SD)), .SD), by = id]

得到这样的东西:

   id      date1 charval
1:  1 2009-01-02      aa;vv;ss
2:  2 2009-01-05      a;b;c;d

但实际上我看到了下一个结果:

   id date1 charval
1:  1    NA      aa
2:  2    NA       a

1)我不明白为什么粘贴不起作用 2)我不明白为什么意思(date1)不起作用 因为例如下一个代码工作正常:

mean(dt2$date1)
[1] "2009-01-04"

1 个答案:

答案 0 :(得分:1)

目前尚不清楚为什么我们必须通过Mapget。在按&#39; id&#39;分组后,获取&#39; date1&#39;的meanpaste&#39; charval&#39;一起

dt2[, .(date1 = mean(date1), charval = toString(charval)), id]
#    id      date1    charval
#1:  1 2009-01-02 aa, vv, ss
#2:  2 2009-01-05 a, b, c, d

注意:toStringpaste(..., collapse=', ')

dt2[, .(date1 = mean(date1), charval = paste(charval, collapse=";")), id]
#   id      date1  charval
#1:  1 2009-01-02 aa;vv;ss
#2:  2 2009-01-05  a;b;c;d

OP的问题是Map使用get来调用mean。这似乎是在触发

  

if(!is.numeric(x)&amp;&amp;!is.complex(x)&amp;&amp;!is.logical(x)){           警告(&#34;参数不是数字或逻辑:返回NA&#34;)           返回(NA_real _)

并在发现&#39; date1&#39;时返回NA属于Date类,但它存储为numeric。一种选择是在envir

中指定get

另一个问题是使用ifelse。最好使用if/else,因为只有两个元素

dt2[, Map(function(x, y)  if(x != "paste") get(x, envir = parent.frame())(y, na.rm = TRUE) 
  else paste(y, collapse=':'), setNames(c("mean", "paste"), names(.SD)), .SD), by = id]
#    id      date1  charval
#1:  1 2009-01-02 aa:vv:ss
#2:  2 2009-01-05  a:b:c:d

get有点棘手,如果指定正确的环境,它会按预期工作

get("mean")(dt2$date1)
#[1] "2009-01-04"

或者代替if/else到#34;粘贴&#34;字符串,我们可以检查列class,如果它是character,那么请执行paste或者返回mean

dt2[, Map(function(x, y)  if(is.character(y)) get(x)(y, collapse=":") 
     else get(x, envir = parent.frame())(y, na.rm = TRUE),
     setNames(c("mean", "paste"), names(.SD)), .SD), by = id]
#   id      date1  charval
#1:  1 2009-01-02 aa:vv:ss
#2:  2 2009-01-05  a:b:c:d

请注意,最好不要轻易使用第一种方法