如何从不同的data.frames创建具有不同列号的表?

时间:2017-09-06 18:43:12

标签: r

我有三个不同的data.frame对象。这些data.frame对象称为Experiment1,Experiment2,Experiment3 ... Experiment {n}(其中n是NumberTubes除以NumberParameters)。

Experiment1:

                                         Name          Statistic NoCells                                             
1                                        CD4 subset      41.2   11935
2                            CD4 subset/CD39 subset      30.6    3656
3                            CD4 subset/CD69 subset      4.93     588
4                            CD4 subset/CD73 subset      49.8    5946
5                           CD4 subset/CD103 subset      2.62     313
6                     CD4 subset/integrin B7 subset      4.37     521
7                                      CD8a subset      33.5    9697
8                          CD8a subset/CD39 subset      54.3    5270
9                          CD8a subset/CD69 subset      5.48     531
10                          CD8a subset/CD73 subset      73.7    7148
11                        CD8a subset/CD103 subset      4.06     394

Experiment2:

                                         Name          Statistic NoCells                                             
1                                        CD4 subset      31.1   11935
2                            CD4 subset/CD39 subset      24.6    3656
3                            CD4 subset/CD69 subset      9.91     588
4                            CD4 subset/CD73 subset      45.1    5946
5                           CD4 subset/CD103 subset      2.61     313
6                     CD4 subset/integrin B7 subset      4.34     521
7                                      CD8a subset      33.2    9697
8                          CD8a subset/CD39 subset      84.3    5270
9                          CD8a subset/CD69 subset      2.48     531
10                          CD8a subset/CD73 subset      70.7    7148
11                        CD8a subset/CD103 subset      4.01     394

现在我想从每个data.frame对象合并表中的。$ Statistic列。每个表的列数应由NumberRepeats变量定义。

例如,假设NumberRepeats = 3:

tab_1 <- cbind(Experiment1, Experiment2$Statistic, Experiment3$Statistic)
tab_2 <- cbind(Experiment4, Experiment5$Statistic, Experiment6$Statistic) 
....
tab_x <- cbind(Experimentn-2, Experimentn-1$Statistic, Experimentn$Statistic)

另一个例子,假设NumberRepeats = 4:

tab_1 <- cbind(Experiment1, Experiment2$Statistic, Experiment3$Statistic, Experiment4$Statistic)
tab_2 <- cbind(Experiment5, Experiment6$Statistic, Experiment7$Statistic, Experiment8$Statistic) 
....
tab_x <- cbind(Experimentn-3, Experimentn-2$Statistic, Experimentn-1$Statistic, Experimentn$Statistic)

如何实现这一目标?该脚本应该提供与上述cbind相同的输出,但是基于NumberRepeats和n(NumberTubes除以NumberParameters)的值自动生成。

ExperimentalDesign:

  parameter repeat1 repeat2 repeat3
1  but       10.0   4.0  3.00
2  hip         4.0   3.0  2.00
3  H2S         0.2   0.1  0.05
4  pro          4.0   3.0  1.00
5  ace          5.0   4.0  3.00

来自循环的所需table_1:

                             name  exp1 exp2  exp3  parameter
1                      CD4 subset  41.2 31.1 ...       but
2          CD4 subset/CD39 subset  30.6 24.6  ...      but

3 个答案:

答案 0 :(得分:1)

使用此功能,您可以将不同数据框对象的列合并到一个表中。您可以通过NumberRepeats变量控制列数。存储在列表中的所有表都具有相同数量的数据列,如
除了最后一个表格之外的NumberRepeats变量...顺便说一句,构建这样的结构很有趣我不确定这是否是分析数据的好方法。

# created test data
for(i in 1:17){
  Name <- letters[1:7]
  Statistic <- round(rnorm(7), 3)
  assign(paste0("Experiment",i), data.frame(Name, Statistic))
}    

# set some parameters
NumberRepeats <- 5
Experiment_n <- 17
skipTube <- c(3,7,11)

#let go

out <- list()
list_index <- 1
counter <- 1
while(counter < Experiment_n) {

  tab <- NULL
  nam <- NULL
  while((is.null(tab) || ncol(tab) < NumberRepeats) & Experiment_n >= counter){
    if(!any(counter == skipTube)){
      tab <- cbind(tab, get(paste0("Experiment", counter))$Statistic)
      # tab <- as.data.frame(tab)
      nam <- c(nam,paste0("Experiment", counter))
    }
    counter <- counter + 1  
  }
  colnames(tab) <- nam
  rownames(tab) <- as.matrix(Experiment1$Name)

  out[[list_index]] <- tab
  assign(paste0('table_', list_index), tab)

  list_index <- list_index + 1  
}
out

# get a idea for the results
p_dat <- sapply(out, function(x) rowMeans(x))
barplot(t(p_dat), beside=T)

答案 1 :(得分:1)

# created test data
for(i in 1:17){
  Name <- letters[1:7]
  Statistic <- round(rnorm(7), 3)
  assign(paste0("Experiment",i), data.frame(Name, Statistic))
}    


# create the other data
dat2 <- c(10.0,   4.0,  3.00,
4.0,   3.0,  2.00,
0.2,   0.1,  0.05,
4.0,   3.0,  1.00,
5.0,   4.0,  3.00)

dat2 <- matrix(dat2, byrow=T, ncol=3 )
colnames(dat2) <- c('conc1', 'conc2', 'conc3')
rownames(dat2) <- c('but', 'hip', 'H2S', 'pro', 'ace')


# set some parameters
NumberRepeats <- 3
Experiment_n <- 17
skipTube <- c(3,7,11)

# lets go
out <- list()
list_index <- 1
counter <- 1
while(counter < Experiment_n) {

  tab <- NULL
  nam <- NULL
  while((is.null(tab) || ncol(tab) < NumberRepeats) & Experiment_n >= counter){
    if(!any(counter == skipTube)){
      tab <- cbind(tab, get(paste0("Experiment", counter))$Statistic)
      tab <- as.data.frame(tab)
      nam <- c(nam,paste0("repeat", counter))
    }
    counter <- counter + 1  
  }
  nam[1:3] <- dat2[list_index,]
  colnames(tab) <- nam
  rownames(tab) <- as.matrix(Experiment1$Name)
  parameter <- rownames(dat2)[list_index]
  tab <- cbind(tab, parameter)

  out[[list_index]] <- tab
  assign(paste0('table_', list_index), tab)

  list_index <- list_index + 1  
}
table_1
table_2
table_3


p_dat <- sapply(out, function(x) rowMeans(x))
barplot(t(p_dat), beside=T)

答案 2 :(得分:0)

你可以使用循环来做到这一点。

library(tidyverse)
library(data.table)

# make a list from all Experiment tables
df_list <- lapply(ls(pattern = 'Experiment'), get)
tables_index <- seq_len(length(ls(pattern = 'Experiment')))

# set NumberRepeats value
NumberRepeats <- 1

# create index for cbind function
subset_index <- rep(seq_len(length(ls(pattern = 'Experiment'))/NumberRepeats), each = NumberRepeats, length.out = length(ls(pattern = 'Experiment')))

# loops for binding needed columns
experiment_list = list()
for (i in (1:(length(ls(pattern = 'Experiment'))/NumberRepeats))) {
  indices <- tables_index[subset_index == i]
  experiment_df <- data.frame(df_list[indices[1]])
  for (j in indices[-1]) {
    experiment_df <- cbind(experiment_df, df_list[[j]]['wt'])
  }
  experiment_list[[i]] <- experiment_df
}

# show result
experiment_list

对我来说,使用和聚合类似列的最佳方法是将一个data.frame中的所有表联合起来,然后按不同的参数进行分组。