我正在尝试将具有不同参数的相同函数应用于单个列,并将结果保存在单独的data.table中,而不更新/修改原始参数:
library(data.table)
set.seed(43)
dt <- data.table(
a = sample(c("aaa","bbb","ccc"),15,replace = T),
year=sample(c("2015","2018"),15,replace=T),
b = sample(c("o","r","s","c","d","f"),15,replace = T),
variant=sample(c("osdcf", "osc", "offsco", "osc", "odfsc", "oc"),15,replace = T)
)
stringsim_methods=c("lv","osa","dl","lcs","jw","qgram")
for (x in stringsim_methods) {
dt1=dt[,(x):=stringsim("oscdf",variant, method=x),by=.(variant,year)]
}
但是,由于分配的工作原理,原始data.table也将被更新,而dt1将包括dt中的所有其他列和行。 我能解决这个问题的唯一方法是用一种方法初始化dt1,然后使用for循环计算其他方法:
dt1=dt[,.(lv=stringsim("oscdf",variant, method="lv")),by=.(variant,year)]
for (x in stringsim_methods) {
dt1=dt1[,(x):=stringsim("oscdf",variant, method=x)]
}
有没有更优雅的方法来实现这一目标:
variant year lv osa dl lcs jw qgram
1: osdcf 2018 0.6000000 0.8000000 0.8000000 0.8000000 0.9333333 1.0000000
2: offsco 2015 0.3333333 0.3333333 0.3333333 0.5454545 0.6972222 0.7272727
3: osdcf 2015 0.6000000 0.8000000 0.8000000 0.8000000 0.9333333 1.0000000
4: odfsc 2015 0.2000000 0.2000000 0.2000000 0.6000000 0.4666667 1.0000000
5: offsco 2018 0.3333333 0.3333333 0.3333333 0.5454545 0.6972222 0.7272727
6: odfsc 2018 0.2000000 0.2000000 0.2000000 0.6000000 0.4666667 1.0000000
7: oc 2015 0.4000000 0.4000000 0.4000000 0.5714286 0.8000000 0.5714286
8: osc 2018 0.6000000 0.6000000 0.6000000 0.7500000 0.8666667 0.7500000
9: osc 2015 0.6000000 0.6000000 0.6000000 0.7500000 0.8666667 0.7500000
谢谢。
答案 0 :(得分:2)
两项更改将使其更整洁:
1.第一步,您似乎并没有真正在进行总结,因此只需要两个变量的唯一组合
2.您可以在j中用lapply替换for <
stringsim <- function(x,variant,method) 1
dt_red <- dt[,unique(.SD),.SDcols=c("variant","year")]
dt_red[,(stringsim_methods):=lapply(stringsim_methods,function(x)
stringsim("oscdf",variant, method=x)),.(variant,year)]
不知道您的stringim函数是做什么的,所以我刚刚创建了一个简单的函数,需要相同的输入
答案 1 :(得分:1)
假设您的stringsim
函数看起来像这样
stringsim <- function(x,variant,method) paste(method, variant, sep = ":")
可行的解决方案可能是:
dt3 <- dt[,
lapply(stringsim_methods, function(x) stringsim("oscdf", variant, method = x)),
by = .(variant, year)]
data.table::setnames(dt3, 3:length(dt3), stringsim_methods)
结果
> dt3
variant year lv osa dl lcs jw qgram
1: osdcf 2018 lv:osdcf osa:osdcf dl:osdcf lcs:osdcf jw:osdcf qgram:osdcf
2: offsco 2015 lv:offsco osa:offsco dl:offsco lcs:offsco jw:offsco qgram:offsco
3: osdcf 2015 lv:osdcf osa:osdcf dl:osdcf lcs:osdcf jw:osdcf qgram:osdcf
4: odfsc 2015 lv:odfsc osa:odfsc dl:odfsc lcs:odfsc jw:odfsc qgram:odfsc
5: offsco 2018 lv:offsco osa:offsco dl:offsco lcs:offsco jw:offsco qgram:offsco
6: odfsc 2018 lv:odfsc osa:odfsc dl:odfsc lcs:odfsc jw:odfsc qgram:odfsc
7: oc 2015 lv:oc osa:oc dl:oc lcs:oc jw:oc qgram:oc
8: osc 2018 lv:osc osa:osc dl:osc lcs:osc jw:osc qgram:osc
9: osc 2015 lv:osc osa:osc dl:osc lcs:osc jw:osc qgram:osc
如果您只想“选择”原始列或计算列以将其存储在新的data.table中,则无需使用:=
。