在一个(非常害怕的)dplyr管道之后,我最终得到了这样一个数据集:
year A B C [....] Z count.A count.B count.C [....] count.Z
1999 10 20 10 ... 6 3 5 67 ... 6
2000 3 5 5 ... 7 5 2 5 ... 5
要重现的一些示例数据:
df <- data.frame(year = c(1999, 2000),
A = c(10, 20),
B = c(3, 6),
C = c(1, 2),
count.A = c(1, 2),
count.B = c(8, 9),
count.C = c(5, 7))
我真正需要的是将每一列与其“计数”对应组合,即
weight.A = A / count.A,
weight.B = B / count.B
我要以编程方式执行此操作,因为我有数百列。有没有办法在dplyr管道中做到这一点?
答案 0 :(得分:3)
不要在列名中存储变量。如果您重塑数据以使其整洁,计算非常简单:
library(tidyverse)
df %>% gather(var, val, -year) %>% # reshape to long
separate(var, c('var', 'letter'), fill = 'left') %>% # extract var from former col names
mutate(var = coalesce(var, 'value')) %>% # add name for unnamed var
spread(var, val) %>% # reshape back to wide
mutate(weight = value / count) # now this is very simple
#> year letter count value weight
#> 1 1999 A 1 10 10.0000000
#> 2 1999 B 8 3 0.3750000
#> 3 1999 C 5 1 0.2000000
#> 4 2000 A 2 20 10.0000000
#> 5 2000 B 9 6 0.6666667
#> 6 2000 C 7 2 0.2857143
答案 1 :(得分:2)
如果您的列一直被命名(并且很容易检索),您可以使用lapply
轻松完成此操作:
cols <- c("A","B","C")
df[,paste0("weighted.",cols)] <- lapply(cols, function(x) df[,x] / df[, paste0("count.",x)])
# year A B C count.A count.B count.C weighted.A weighted.B weighted.C
#1 1999 10 3 1 1 8 5 10 0.3750000 0.2000000
#2 2000 20 6 2 2 9 7 10 0.6666667 0.2857143
答案 2 :(得分:1)
假设您可以以编程方式创建所有列名称的向量,以下是我为上述示例所做的操作
for (c.name in c("A", "B", "C")) {
c.weight <- sprintf("weight.%s", c.name)
c.count <- sprintf("count.%s", c.name)
df[,c.weight] <- df[,c.name] / df[,c.count]
}
答案 3 :(得分:1)
假设列按顺序排列,我们可以使用data.table
。在.SDcols
中指定感兴趣的列,并将Data.table的子集的子集除以另一半,并将其分配(:=
)到新列
library(data.table)
setDT(df)[, paste0("weighted.",names(df)[1:3]) := .SD[,1:3]/.SD[,4:6], .SDcols = A:count.C]
df
# year A B C count.A count.B count.C weighted.year weighted.A weighted.B
#1: 1999 10 3 1 1 8 5 10 0.3750000 0.2000000
#2: 2000 20 6 2 2 9 7 10 0.6666667 0.2857143