我想要所有可能的组合,将数据帧中的某些列按2进行分组,而不重复(顺序无关紧要)。我想追加具有两个连接列名称的新列。 例如:
df
col1 col2 col3 col4
ind1 a c a
ind2 c g a
ind3 a g t
我想得到:
newdf
col1 col2 col3 col4 col2col3 col2col4 col3col4
ind1 a c a ac aa ca
ind2 c g a cg ca ga
ind3 a g t ag at gt
我尝试了以下方法:
cl <- c("col2", "col3", "col4") #vector with the columns I want
library(gtools)
lg <- length(cl)
cmb <- combinations(lg, 2, cl) #this gives me all the combinations without repetition
cmb
[,1] [,2]
[1,] "col2" "col3"
[2,] "col2" "col4"
[3,] "col3" "col4"
cmb <- paste(cmb[,1],cmb[,2]) #for joining the columns of cmb
cmb1 <- paste0("df$",cmb[,1], ", df$", cmb[,2])
此后,我尝试使用sapply
,但无法使其正常工作。这是许多尝试之一。
newdf <- sapply(cmb1, function(x) {
df$[,x] <- paste0(x)
})
有更好的方法吗?
答案 0 :(得分:1)
一种方法是在预定义的列列表上使用mapply()
。首先,您需要像创建列名一样创建矩阵。您也可以使用comb()
来做到这一点:
> df <- data.frame(col2 = c("a", "c", "a"), col3 = c("c", "g", "g"), col4 = c("a", "a", "t"), stringsAsFactors = FALSE)
> nombres <- combn(colnames(df), 2)
> nombres
[,1] [,2] [,3]
[1,] "col2" "col2" "col3"
[2,] "col3" "col4" "col4"
然后,创建两个向量列表:
> lista1 <- lapply(nombres[1,], function(x){
+ df[,x]
+ })
>
> lista2 <- lapply(nombres[2,], function(x){
+ df[,x]
+ })
> lista1
[[1]]
[1] "a" "c" "a"
[[2]]
[1] "a" "c" "a"
[[3]]
[1] "c" "g" "g"
> lista2
[[1]]
[1] "c" "g" "g"
[[2]]
[1] "a" "a" "t"
[[3]]
[1] "a" "a" "t"
最后,使用mapply()
和paste()
这两个列表:
> mapply(function(x, y){
+ paste(x, y, sep = "")
+ }, x = lista1, y = lista2)
[,1] [,2] [,3]
[1,] "ac" "aa" "ca"
[2,] "cg" "ca" "ga"
[3,] "ag" "at" "gt"
然后您可以cbind
将矩阵复制到原始数据帧:
> df2 <- mapply(function(x, y){
+ paste(x, y, sep = "")
+ }, x = lista1, y = lista2)
>
> colnames(df2) <- paste(nombres[1,], nombres[2,], sep = "")
>
> df_new <- cbind.data.frame(df, df2)
> df_new
col2 col3 col4 col2col3 col2col4 col3col4
1 a c a ac aa ca
2 c g a cg ca ga
3 a g t ag at gt
希望有帮助!