R数据框按行列

时间:2018-01-06 08:05:55

标签: r dataframe

我有以下R数据帧:

DF1

     a      b      c      d
2  0.671  0.105  0.181  0.241
3  0.446 -0.243  0.051  1.577
5  0.624  0.075 -0.451 -0.212

和DF2

     a      b      c      d
2  3.672  7.204 -0.164  3.251
3  4.445 -0.242  0.025  1.627
5  2.621  0.375 -0.468 -4.762

两个数据框都具有相同的尺寸。 我想通过它们在df中的索引位置来组合它们,因此最终结果产生12个向量(或12个1维df),每个向量名称将反映它绘制其值的索引。

例如,结果将是:

a2(0.671,3.672)
b2(0.105,7.204) 
...
d5(-0.212,-4.762)

谢谢!

3 个答案:

答案 0 :(得分:2)

我们可以使用base R

执行此操作
lst <- Map(`c`, t(DF1), t(DF2))
names(lst) <-  do.call(paste0, expand.grid(dimnames(t(DF1))))

答案 1 :(得分:1)

看到你计划在最后做do.call(cbind, ...),也许你应该考虑采用不同的方法。您可以轻松创建如下函数:

combineTranspose <- function(...) {
  temp <- list(...)
  rbindlist(lapply(temp, function(x) {
    melt(as.data.table(x, keep.rownames = TRUE), "rn")
  }))[, dcast(.SD, rowid(variable, rn) ~ paste0(variable, rn), 
              value.var = "value")]
}

该函数将可变数量的data.frame s作为输入。它将它们转换为data.table s,将rownames添加为变量,rbind将它们组合在一起,然后将数据重新整形为宽格式。

这里的一个优点是输入中的列和行的顺序 - 甚至输入中存在相同的列和行 - 都无关紧要。这是一个简单的例子。

set.seed(1)

df1 <- data.frame(a = runif(3), b = runif(3), c = runif(3), 
                  d = runif(3), row.names = c(1, 2, 3))
df2 <- data.frame(a = runif(3), b = runif(3), c = runif(3), 
                  d = runif(3), row.names = c(1, 3, 4))
df3 <- data.frame(a = runif(3), b = runif(3), c = runif(3), 
                  d = runif(3), row.names = c(4, 2, 3))

combineTranspose(df1, df2, df3)
##    variable        a1        a2         a3        a4        b1        b2        b3
## 1:        1 0.2655087 0.3721239 0.57285336 0.7698414 0.9082078 0.2016819 0.8983897
## 2:        2 0.6870228 0.3861141 0.38410372 0.2672207 0.4976992 0.8696908 0.7176185
## 3:        3        NA        NA 0.01339033        NA        NA        NA 0.3403490
##           b4        c1        c2        c3        c4         d1        d2        d3
## 1: 0.9919061 0.9446753 0.6607978 0.6291140 0.9347052 0.06178627 0.2059746 0.1765568
## 2: 0.3823880 0.3800352 0.5995658 0.7774452 0.4820801 0.21214252 0.8273733 0.6516738
## 3:        NA        NA        NA 0.4935413        NA         NA        NA 0.6684667
##           d4
## 1: 0.1255551
## 2: 0.1862176
## 3:        NA

以下是输入数据的功能:

DF1 <- structure(list(a = c(0.671, 0.446, 0.624), b = c(0.105, -0.243, 0.075), 
    c = c(0.181, 0.051, -0.451), d = c(0.241, 1.577, -0.212)), 
    .Names = c("a", "b", "c", "d"), row.names = c("2", "3", "5"), class = "data.frame")
DF2 <- structure(list(a = c(3.672, 4.445, 2.621), b = c(7.204, -0.242, 0.375), 
    c = c(-0.164, 0.025, -0.468), d = c(3.251, 1.627, -4.762)), 
    .Names = c("a", "b", "c", "d"), row.names = c("2", "3", "5"), class = "data.frame")

combineTranspose(DF1, DF2)
##    variable    a2    a3    a5    b2     b3    b5     c2    c3     c5    d2    d3     d5
## 1:        1 0.671 0.446 0.624 0.105 -0.243 0.075  0.181 0.051 -0.451 0.241 1.577 -0.212
## 2:        2 3.672 4.445 2.621 7.204 -0.242 0.375 -0.164 0.025 -0.468 3.251 1.627 -4.762

答案 2 :(得分:0)

这样做你想要的吗?

# sample data
df1 = read.table(text=" a      b      c      d
2  0.671  0.105  0.181  0.241
3  0.446 -0.243  0.051  1.577
5  0.624  0.075 -0.451 -0.212" ,header=T)    
df2 = read.table(text="     a      b      c      d
2  3.672  7.204 -0.164  3.251
3  4.445 -0.242  0.025  1.627
5  2.621  0.375 -0.468 -4.762" ,header=T)

# reshaping the dataframe    
library(reshape2)
library(dplyr)
df1$rowid = seq(nrow(df1))
df2$rowid = seq(nrow(df2))
df1 = melt(df1, id.vars=c("rowid"))
df2 = melt(df2, id.vars=c("rowid"))

df1 = df1 %>% full_join(df2,by=c('rowid','variable'))

输出:

   rowid variable value.x value.y
1      2        a   0.671   3.672
2      3        a   0.446   4.445
3      5        a   0.624   2.621
4      2        b   0.105   7.204
5      3        b  -0.243  -0.242
6      5        b   0.075   0.375
7      2        c   0.181  -0.164
8      3        c   0.051   0.025
9      5        c  -0.451  -0.468
10     2        d   0.241   3.251
11     3        d   1.577   1.627
12     5        d  -0.212  -4.762

或者,如果你想要一维df的列表:

y = split(df1[,c('value.x','value.y')],seq(nrow(df1)))
names(y) = paste0(df1$variable,df1$rowid)

输出:

   $a2
  value.x value.y
1   0.671   3.672

$a3
  value.x value.y
2   0.446   4.445

$a5
  value.x value.y
3   0.624   2.621

$b2
  value.x value.y
4   0.105   7.204

$b3
  value.x value.y
5  -0.243  -0.242

$b5
  value.x value.y
6   0.075   0.375

$c2
  value.x value.y
7   0.181  -0.164

$c3
  value.x value.y
8   0.051   0.025

$c5
  value.x value.y
9  -0.451  -0.468

$d2
   value.x value.y
10   0.241   3.251

$d3
   value.x value.y
11   1.577   1.627

$d5
   value.x value.y
12  -0.212  -4.762

希望这有帮助!