将结果独家保存在另一个data.table中

时间:2018-09-10 19:26:06

标签: r data.table

我正在尝试将具有不同参数的相同函数应用于单个列,并将结果保存在单独的data.table中,而不更新/修改原始参数:

library(data.table)
set.seed(43)
dt <- data.table(
         a = sample(c("aaa","bbb","ccc"),15,replace = T),
         year=sample(c("2015","2018"),15,replace=T),
         b = sample(c("o","r","s","c","d","f"),15,replace = T),
         variant=sample(c("osdcf", "osc", "offsco", "osc", "odfsc", "oc"),15,replace = T)
       )
stringsim_methods=c("lv","osa","dl","lcs","jw","qgram")
for (x in stringsim_methods) { 
         dt1=dt[,(x):=stringsim("oscdf",variant, method=x),by=.(variant,year)]
         }

但是,由于分配的工作原理,原始data.table也将被更新,而dt1将包括dt中的所有其他列和行。 我能解决这个问题的唯一方法是用一种方法初始化dt1,然后使用for循环计算其他方法:

dt1=dt[,.(lv=stringsim("oscdf",variant, method="lv")),by=.(variant,year)]
for (x in stringsim_methods) {
  dt1=dt1[,(x):=stringsim("oscdf",variant, method=x)]
}

有没有更优雅的方法来实现这一目标:

   variant year        lv       osa        dl       lcs        jw    qgram
1:   osdcf 2018 0.6000000 0.8000000 0.8000000 0.8000000 0.9333333 1.0000000
2:  offsco 2015 0.3333333 0.3333333 0.3333333 0.5454545 0.6972222 0.7272727
3:   osdcf 2015 0.6000000 0.8000000 0.8000000 0.8000000 0.9333333 1.0000000
4:   odfsc 2015 0.2000000 0.2000000 0.2000000 0.6000000 0.4666667 1.0000000
5:  offsco 2018 0.3333333 0.3333333 0.3333333 0.5454545 0.6972222 0.7272727
6:   odfsc 2018 0.2000000 0.2000000 0.2000000 0.6000000 0.4666667 1.0000000
7:      oc 2015 0.4000000 0.4000000 0.4000000 0.5714286 0.8000000 0.5714286
8:     osc 2018 0.6000000 0.6000000 0.6000000 0.7500000 0.8666667 0.7500000
9:     osc 2015 0.6000000 0.6000000 0.6000000 0.7500000 0.8666667 0.7500000

谢谢。

2 个答案:

答案 0 :(得分:2)

两项更改将使其更整洁:

1.第一步,您似乎并没有真正在进行总结,因此只需要两个变量的唯一组合
2.您可以在j中用lapply替换for <

stringsim <- function(x,variant,method) 1
dt_red <- dt[,unique(.SD),.SDcols=c("variant","year")]
dt_red[,(stringsim_methods):=lapply(stringsim_methods,function(x) 
stringsim("oscdf",variant, method=x)),.(variant,year)]


不知道您的stringim函数是做什么的,所以我刚刚创建了一个简单的函数,需要相同的输入

答案 1 :(得分:1)

假设您的stringsim函数看起来像这样

stringsim <- function(x,variant,method) paste(method, variant, sep = ":")

可行的解决方案可能是:

dt3 <- dt[,
          lapply(stringsim_methods, function(x) stringsim("oscdf", variant, method = x)),
          by = .(variant, year)]
data.table::setnames(dt3, 3:length(dt3), stringsim_methods)

结果

> dt3
   variant year        lv        osa        dl        lcs        jw        qgram
1:   osdcf 2018  lv:osdcf  osa:osdcf  dl:osdcf  lcs:osdcf  jw:osdcf  qgram:osdcf
2:  offsco 2015 lv:offsco osa:offsco dl:offsco lcs:offsco jw:offsco qgram:offsco
3:   osdcf 2015  lv:osdcf  osa:osdcf  dl:osdcf  lcs:osdcf  jw:osdcf  qgram:osdcf
4:   odfsc 2015  lv:odfsc  osa:odfsc  dl:odfsc  lcs:odfsc  jw:odfsc  qgram:odfsc
5:  offsco 2018 lv:offsco osa:offsco dl:offsco lcs:offsco jw:offsco qgram:offsco
6:   odfsc 2018  lv:odfsc  osa:odfsc  dl:odfsc  lcs:odfsc  jw:odfsc  qgram:odfsc
7:      oc 2015     lv:oc     osa:oc     dl:oc     lcs:oc     jw:oc     qgram:oc
8:     osc 2018    lv:osc    osa:osc    dl:osc    lcs:osc    jw:osc    qgram:osc
9:     osc 2015    lv:osc    osa:osc    dl:osc    lcs:osc    jw:osc    qgram:osc

如果您只想“选择”原始列或计算列以将其存储在新的data.table中,则无需使用:=