在数据表中成对运行2列,并用新列的名称替换它们

时间:2017-02-15 13:01:16

标签: r shiny data.table

假设我有一个大数据表,如下所示:

Sequence A1 B1 A2 B2 s1 0 2 9 11 s2 1 3 3 2 s3 2 2 4 1 s4 3 5 4 14 s5 3 7 2 0 s6 0 2 8 5 . . . . . . . . . . . . . . .

我想计算一些操作,如log2(A2 / A1)& log2(B2 / B1)并返回带有列名称的数据表" A2 / A1"和" B2 / B1"看起来像这样:

Sequence A2/A1 B2/B1 s1 log2(9/0) log2(11/2) s2 log2(3/1) log2(2/3) s3 log2(4/2) log2(1/2) s4 log2(4/3) log2(14/5) s5 log2(2/3) log2(0/7) s6 log2(8/0) log2(5/2)

我已经找到了一种解决方法,但它运作正常。由于列的选择是动态发生的(在UI中),我无法真正使用它,并且我仍然获得所有列(A1,B1,A2,B2和A2 / A1 B2 / B1)。

selectInput("firstSelection", "Select First Factor", choices = "", multiple = T, 
helpText("First parameter for the calculation of Regulation-Factor")),
selectInput("secondSelection", "Select Second Factor", choices = "", multiple = T,
helpText("Second parameter for the calculation of Regulation-Factor"))

Hier是我的解决方法:

input_table <<- getData()[, paste(input$secondSelection, input$firstSelection,sep= "/"):=
list(get(input$secondSelection[1])/get(input$firstSelection[1]),
get(input$secondSelection[2])/get(input$firstSelection[2]))]

我想这一定是更好的方法,可能会使用应用等功能或 .I .SD 等参数, .SDColms 。我读到了它们,但仍然没有真正了解如何以及何时使用它们。

1 个答案:

答案 0 :(得分:1)

我们可以使用set函数来完成此操作。使用原始数据集中的第一列“序列”创建结果数据集('res'),其中两列由NA占用。然后,通过循环“j1”中指定的索引,set这些列中的值,对“dt1”中的列进行子集,除以log2

res <- data.table(Sequence = dt1$Sequence, A2A1= NA_real_, B2B1=NA_real_)
j1 <- as.integer(seq_len(uniqueN(sub("\\d+", "", names(dt1)[-1]))) + 1)

for(j in j1){
  set(res, i = NULL, j= j, value = log2(dt1[[j+2]]/dt1[[j]]))
}
res
#    Sequence       A2A1       B2B1
#1:       s1        Inf  2.4594316
#2:       s2  1.5849625 -0.5849625
#3:       s3  1.0000000 -1.0000000
#4:       s4  0.4150375  1.4854268
#5:       s5 -0.5849625       -Inf
#6:       s6        Inf  1.3219281

log2(9/0)
#[1] Inf
log2(11/2)
#[1] 2.459432

数据

dt1 <- structure(list(Sequence = c("s1", "s2", "s3", "s4", "s5", "s6"
 ), A1 = c(0L, 1L, 2L, 3L, 3L, 0L), B1 = c(2L, 3L, 2L, 5L, 7L, 
 2L), A2 = c(9L, 3L, 4L, 4L, 2L, 8L), B2 = c(11L, 2L, 1L, 14L, 
 0L, 5L)), .Names = c("Sequence", "A1", "B1", "A2", "B2"), 
 class = "data.frame", row.names = c(NA, -6L))
setDT(dt1)