Question

我正在尝试进行一些前/后比较，下面是一个小样本：

dataset =
data.table(
Key1 = c("p","q","r"),
Key2 = c("a","b","c"),
a_pre = c(3,2,6),
b_pre = c(2,6,3),
a_post = c(1,2,3),
b_post = c(1,4,2)
#etc.
)

dataset[,a_compare := a_pre/a_post]
dataset[,b_compare := b_pre/b_post]
#etc.

问题是我拥有的数量远远超过a和b，并且有时是可变的，因此不能手动编码每个比较。我试图避免eval(parse())。

假设我有数量c("a","b", etc.)的名称。我目前的思考过程是这样的：

loop through the quantity names
{
grep on colnames(dataset) for each quantity name, 
use above to subset the pre and post column. include the keys in this subset.
send this subsetted dataset to a function that calculates pre/post irrespective of the specific quantity
merge result of function back to original dataset
}

我觉得必须有更好的方法来做到这一点。有什么想法吗？

Answer 1

在此处使用get而不是eval(parse非常标准：

v = c("a", "b")
dataset[, paste0(v, "_compare") :=
            lapply(v, function(x) get(paste0(x, "_pre")) / get(paste0(x, "_post")))]

Answer 2

我发现for循环更易于编写和读取：

basevars = c("a","b","c","d")
for (i in basevars)
    DT[, paste0(i,"_compare"):=get(paste0(i,"_pre"))/get(paste0(i,"_post"))]

我从未真正知道为什么R不能只定义+来处理字符串。这是一个错误，所以它不像它使用或任何东西：

> "a"+"b"
Error in "a" + "b" : non-numeric argument to binary operator

否则你可以这样做：

for (i in basevars)
    DT[, i+"_compare" := get(i+"_pre")/get(i+"_post")]

Answer 3

像

这样的东西

foo <- dataset[,grep("_pre", names(dataset))] / dataset[,grep("_post", names(dataset))]
names(foo) <- sub("pre", "comp", names(foo))

（我将data.table重新格式化为data.frame。 - 不知道data.tables，虽然我确信它非常有用。）

在类似命名的列上操作时避免使用eval（解析（文本））

3 个答案: