使用子字符串将列乘以列

时间:2018-07-31 01:59:53

标签: r dataframe

我对R还是比较陌生,并且一直在努力解决潜在的非常简单的问题。

我的数据具有以相似方式命名的多个列。这是一个示例数据:

df = data.frame(PPID = 1:50, 
                time1 = sample(c(0,1), 50, replace = TRUE),
                time2 = sample(c(0,1), 50, replace = TRUE),
                time3 = sample(c(0,1), 50, replace = TRUE),
                condition1 = sample(c(0:3), 50, replace = TRUE),
                condition2 = sample(c(0:3), 50, replace = TRUE))

在我的实际数据中,我有更多列-时间约50列,条件约10列。

我想乘以星期列和条件列,例如在该示例数据中,它应该给我6个额外的列,例如:time1_condition1,time1_condition2,time2_condition1,time2_condition2,time3_condition1,time3_condition2。

我尝试了this thread中建议的解决方案,但是这些解决方案不起作用(大概是因为我不了解mapply / apply是如何工作的,并且没有进行适当的更改)-它给了我错误消息,即更长的参数不是短的长度的倍数。

任何帮助将不胜感激!

3 个答案:

答案 0 :(得分:2)

#Get all the columns with "time" columns
time_cols <- grep("^time", names(df))

#Get all the columns with "condition" column
condition_cols <- grep("^condition", names(df))

#Multiply each "time" columns with all the condition columns
# and creating a new dataframe
new_df <- do.call("cbind", lapply(df[time_cols] , function(x) x * 
                                df[condition_cols]))

#Combine both the dataframes
complete_df <- cbind(df,new_df)

我们还可以使用expand.grid

生成列名
new_names <- do.call("paste0", 
        expand.grid(names(df)[condition_cols], names(df)[time_cols]))
colnames(complete_df)[7:12] <- new_names

答案 1 :(得分:2)

这里是tidyverse的替代方式

library(tidyverse)
idx.time <- grep("time", names(df), value = T)
idx.cond <- grep("condition", names(df), value = T)
bind_cols(
    df,
    map_dfc(transpose(expand.grid(idx.time, idx.cond, stringsAsFactors = F)),
        ~setNames(data.frame(df[, .x$Var1] * df[, .x$Var2]), paste(.x$Var1, .x$Var2, sep = "_"))))
#   PPID time1 time2 time3 condition1 condition2 time1_condition1
#1     1     1     0     1          3          0                3
#2     2     0     1     1          0          1                0
#3     3     0     1     1          0          2                0
#4     4     0     0     1          0          3                0
#5     5     0     0     0          0          3                0
#...

说明:expand.grid创建idx.timeidx.cond的所有成对组合。 transpose由内而外翻转list / data.frame并返回list,类似于apply(..., 1, as.list)map_dfc然后对该list的每个元素进行操作,并按列绑定结果。

答案 2 :(得分:1)

使用

library(tidyverse)

a = df[grep("time",names(df))]
b = df[grep("condition",names(df))]

我们可以做到:

 map(a,~.x*b)%>%
   bind_cols()%>%
   set_names(paste(rep(names(a),each=ncol(b)),names(b),sep="_"))

或者我们可以

cross2(a,b)%>%
  map(lift(`*`))%>%
  set_names(paste(rep(names(a),each=ncol(b)),names(b),sep="_"))%>%
  data.frame()

   time1_condition1 time2_condition1 time3_condition1 time1_condition2 time2_condition2 time3_condition2
1                 3                0                3                2                0                2
2                 3                3                0                1                1                0
3                 0                0                0                0                0                0
4                 3                3                0                0                0                0
5                 0                0                2                0                0                1
6                 0                0                1                0                0                1
7                 2                2                0                0                0                0