我有一个大的data.frame,我希望将列中的值连接在一起,然后使用输出创建一个新的data.frame。由于我的data.frame有近1700列,我认为最简单的方法是遍历列。以下是我想要做的一个例子。
起始值:
variable1 = c(var1, var2, var3)
variable2 = c(var4, var5, var6)
variable3 = c(var7, var8, var9)
df = data.frame(variable1, variable2, variable3)
预期产出:
variable1 variable2 variable3
1 var1_var2 var4_var5 var7_var8
2 var1_var3 var4_var6 var7_var9
3 var2_var3 var5_var6 var8_var9
我现在使用的代码是:
index = 1
column = 1
Complexes <- dim(df)[2]
proteins <- dim(df)[1]
complex <-list()
interactions <- list()
complexcol <- list()
for(i in 1:Complexes){
complex[[column]]=(for(j in 1:proteins){
for(k in j+1:proteins){
interactions[index] = c(paste0(corum[i,j],"_",corum[i,k]))
index = index +1
}
})
column = column + 1
print(column)
index = 1
}
当我执行时,它遍历列,但它不会在新列表或data.frame中生成输出。
谢谢!
答案 0 :(得分:4)
您可以使用combn
函数获取所有组合,使此操作成为1行:
# Build example data
(dat = data.frame(1:3, 4:6, 7:9))
# X1.3 X4.6 X7.9
# 1 1 4 7
# 2 2 5 8
# 3 3 6 9
# Get all combinations of rows
t(apply(combn(nrow(dat), 2), 2, function(x) paste0(dat[x[1],], "_", dat[x[2],])))
# [,1] [,2] [,3]
# [1,] "1_2" "4_5" "7_8"
# [2,] "1_3" "4_6" "7_9"
# [3,] "2_3" "5_6" "8_9"
如果您有一个存储因子的数据框,并且想要组合它们的级别,您可以将数据帧转换为实际存储字符串然后使用相同代码的数据框
# Make data frame with factors
(dat = data.frame(X=c("a", "b", "c"), Y=c("d", "e", "f"), Z=c("g", "h", "i")))
# X Y Z
# 1 a d g
# 2 b e h
# 3 c f i
str(dat)
# 'data.frame': 3 obs. of 3 variables:
# $ X: Factor w/ 3 levels "a","b","c": 1 2 3
# $ Y: Factor w/ 3 levels "d","e","f": 1 2 3
# $ Z: Factor w/ 3 levels "g","h","i": 1 2 3
# Convert to data frame with strings and then use same code
dat2 <- data.frame(lapply(dat, as.character), stringsAsFactors=F)
t(apply(combn(nrow(dat2), 2), 2, function(x) paste0(dat2[x[1],], "_", dat2[x[2],])))
# [,1] [,2] [,3]
# [1,] "a_b" "d_e" "g_h"
# [2,] "a_c" "d_f" "g_i"
# [3,] "b_c" "e_f" "h_i"
答案 1 :(得分:1)
我想在此处使用dplyr
和data.table
做出更多贡献。受到@David Arenburg的启发,我得到了以下内容。
df <- data.frame(variable1 = c("var1", "var2", "var3"),
variable2 = c("var4", "var5", "var6"),
variable3 = c("var7", "var8", "var9"),
stringsAsFactors = FALSE)
library(dplyr)
mutate_each(df, funs(combn(., 2, paste, collapse = "_")))
# variable1 variable2 variable3
#1 var1_var2 var4_var5 var7_var8
#2 var1_var3 var4_var6 var7_var9
#3 var2_var3 var5_var6 var8_var9
library(data.table)
setDT(df)[, lapply(.SD, function(x) {combn(x, 2, paste, collapse = "_")})]
# variable1 variable2 variable3
#1: var1_var2 var4_var5 var7_var8
#2: var1_var3 var4_var6 var7_var9
#3: var2_var3 var5_var6 var8_var9