通过在data.table中以编程方式对列进行操作

时间:2019-02-28 23:32:29

标签: r data.table

我想使用包含bydata.table中列名称的字符向量,以及定义组的交互方式。向量包含在几个data.table中通用的列,但是每个data.table都有一些唯一的列。那可能吗?下面的示例。

library(data.table)
mtcarsdt <- data.table(mtcars)
bycols <- c('cyl', 'gear')   # Defined for use across multiple data.tables
mtcarsdt[
  , .(mpg = mean(mpg)),      # This does not work.
  by = c('carb%%2', bycols)  # How can I make this work?
]
mtcarsdt[
  , .(mpg = mean(mpg)), 
  by = .(carb%%2, cyl, gear) # This works
]

3 个答案:

答案 0 :(得分:1)

您可以将3向交互向量作为by参数:

mtcarsdt[
    , .(mpg = mean(mpg)),      # This does not work.
    by =  interaction(mtcars$carb%%2, interaction( mtcars[, bycols]))  # How can I make this work?
    ]

    interaction      mpg
 1:       0.6.4 19.75000
 2:       1.4.4 29.10000
 3:       1.6.3 19.75000
 4:       0.8.3 14.63333
 5:       0.4.4 24.75000
 6:       1.8.3 16.30000
 7:       1.4.3 21.50000
 8:       0.4.5 28.20000
 9:       0.8.5 15.40000
10:       0.6.5 19.70000

答案 1 :(得分:1)

这是一种非常直观的方法:

// this is some example of the names & email adresses - they are fake
const outlook = "Anders Jensen (EAAAANJE) <eaaaanje@students.eaax.dk>; Bodil Pedersen (EAAABOPE) <eaaabope@students.eaax.dk>; Åse Andersen (EAAAIDAN) <eaaaasan@students.eaax.dk>; Mühl Svendsen (EAAAPESV) <eaaamusv@students.eaax.dk>";

// we find all the emails & names of the students
let regexEmail = /\<.*?\>/g;
let regexName = /\w+\s\w+\s/gi;

// an array of all the td-tags
let tdTags = document.querySelectorAll("td");

// The emails and names are inserted in the table
for(let i = 0; regexName.exec(outlook) !== null; i++) {

    tdTags[i].innerHTML = regexName.exec(outlook)[i]; // name
    tdTags[i].nextSibling.innerHTML = regexEmail.exec(outlook)[i]; // e-mail
}

另一个选择是构造整个表达式并评估/解析它:

mtcarsdt[, .(mpg = mean(mpg)), by = eval(as.call(parse(text = c(".", bycols, "carb %% 2"))))]
#    cyl gear carb      mpg
# 1:   6    4    0 19.75000
# 2:   4    4    1 29.10000
# 3:   6    3    1 19.75000
# 4:   8    3    0 14.63333
# 5:   4    4    0 24.75000
# 6:   8    3    1 16.30000
# 7:   4    3    1 21.50000
# 8:   4    5    0 28.20000
# 9:   8    5    0 15.40000
#10:   6    5    0 19.70000

您还可以使用bycols = "cyl, gear" eval(parse(text = paste0('mtcarsdt[, .(mpg = mean(mpg)), by = .(carb %% 2, ', bycols, ')]'))) / eval玩同样的把戏。

如果您不希望将quote列保留为列,并且主要关心分组,则可以执行以下操作:

bycols

答案 2 :(得分:0)

这似乎是在给定环境中拼接和评估bycols的问题。 我对data.table包不太熟悉。但是由于有其他答案,我认为我可以给出一个替代流程来满足您的要求。 诀窍是将rlang !!!运算符与syms一起使用。 这是对bycols向量进行拼接并对其进行评估。 dplyr分组和汇总很容易。

library(dplyr)
library(rlang)
bycols <- c("cyl", "gear")
mtcarsdt %>% mutate(carb2 = carb%%2) %>% 
  group_by(carb2, !!! syms(bycols)) %>% 
  summarise(m_mpg = mean(mpg))

现在bycols可以是您喜欢的任何列。