根据列表

时间:2016-04-05 17:17:34

标签: r lapply

我有216个数据框的列表,每个数据框有3个变量。例如:

df1 <- data.frame(A = 1:10, B= 11:20, C = 21:30)
df2 <- data.frame(A = 31:40, B = 41:50, C = 51:60) 
listDF <- list(df1, df2)

我需要根据列表中的位置顺序重命名每个数据框中的变量。我能做到的。例如:

#create lists of the variable names
Bnames <- c("feel1", "feel2")
Cnames <- c("cat1", "cat2")
#sequentially name each data frame's columns
k <- 0
for(i in 1:length(listDF)){
  k = k+1
  names(listDF[[i]]) <- c("ID",Bnames[k],Cnames[k])
  }
#I know people prefer lapply; I tend to switch back and forth depending on what I'm doing

我遇到的问题是,在216个数据框列表中(24个“cat”变量x 9个“感觉”变量= 216个),我需要“Bnames”和“Cnames”列表来排序率。我需要前9个数据帧有C = cat1,B = feel1:9,接下来的9有C = cat2,B = feel1:9,依此类推。所以我需要沿着B递归排序,但是每9个数据帧沿着C缓慢排序。每个数据框中的“A”应为“ID”。

我真的不知道如何做到这一点。提前感谢任何建议!

此外 - 如果有人建议更易理解的标题,我很乐意改变它。

修改

当我完成时,知道我想要在哪里结束可能会有所帮助。每个ID都存在于不同数量的数据帧中,最终我想要的是将数据帧重新整形并合并为1个数据帧,格式如下:

ID  feel1.1  feel1.2 ... feel2.1   feel2.2
2   NA       4           NA        7
3   2        1           6         3

其中feel1.1表示“cat1”的“feel1”值,如果ID没有“feel”和“cat”的特定组合,则缺少值(因此ID 2没有“feel1”的值cat1但是为cat2做了。最终,应该有217列和尽可能多的行。

我的(不好)解决方案:

X <- listDF
#create lists of the data frame numbers for each "feel" variable
feel1 <- seq(1,216,by=9)
feel2 <- seq(2,216,by=9)
feel3 <- seq(3,216,by=9)
feel4 <- seq(4,216,by=9)
feel5 <- seq(5,216,by=9)
feel6 <- seq(6,216,by=9)
feel7 <- seq(7,216,by=9)
feel8 <- seq(8,216,by=9)
feel9 <- seq(9,216,by=9)

#assign correct names for the "feel" variables in each data frame
for(i in 1:length(X)){
  if(i %in% feel1){
    names(X[[i]]) <- c("UniqueID", "cat", "feel1")
  }
  if(i %in% feel2){
    names(X[[i]]) <- c("UniqueID", "cat", "feel2")
  }
  if(i %in% feel3){
    names(X[[i]]) <- c("UniqueID", "cat", "feel3")
  }
  if(i %in% feel4){
    names(X[[i]]) <- c("UniqueID", "cat", "feel4")
  }
  if(i %in% feel5){
    names(X[[i]]) <- c("UniqueID", "cat", "feel5")
  }
  if(i %in% feel6){
    names(X[[i]]) <- c("UniqueID", "cat", "feel6")
  }
  if(i %in% feel7){
    names(X[[i]]) <- c("UniqueID", "cat", "feel7")
  }
  if(i %in% feel8){
    names(X[[i]]) <- c("UniqueID", "cat", "feel8")
  }
  if(i %in% feel9){
    names(X[[i]]) <- c("UniqueID", "cat", "feel9")
  }
}

#'melt' each of the dataframes and then remove the rows with 'cat'
X <- lapply(X, function(x) melt(x, id.vars ="UniqueID"))
X <- lapply(X, function(x) subset(x, variable != "cat"))

#add the appropriate 'cat' number to each 'feel' name
for(j in 1:length(X)){
  if(j <= 9){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".24")
  }
  if(j > 9 & j <= 18){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".1")
  }
  if(j > 18 & j <= 27){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".2")
  }
  if(j > 27 & j <= 36){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".3")
  }
  if(j > 36 & j <= 45){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".4")
  }
  if(j > 45 & j <= 54){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".5")
  }
  if(j > 54 & j <= 63){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".6")
  }
  if(j > 63 & j <= 72){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".7")
  }
  if(j > 72 & j <= 81){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".8")
  }
  if(j > 81 & j <= 90){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".9")
  }
  if(j > 90 & j <= 99){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".10")
  }
  if(j > 99 & j <= 108){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".11")
  }
  if(j > 108 & j <= 117){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".12")
  }
  if(j > 117 & j <= 126){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".13")
  }
  if(j > 126 & j <= 135){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".14")
  }
  if(j > 135 & j <= 144){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".15")
  }
  if(j > 144 & j <= 153){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".16")
  }
  if(j > 153 & j <= 162){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".17")
  }
  if(j > 162 & j <= 171){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".18")
  }
  if(j > 171 & j <= 180){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".19")
  }
  if(j > 180 & j <= 189){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".20")
  }
  if(j > 189 & j <= 198){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".21")
  }
  if(j > 198 & j <= 207){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".22")
  }
  if(j > 207 & j <= 216){
    X[[j]]$variable <- paste0(X[[j]]$variable, ".23")
  }
}

#reshape each data frame into 2 columns: ID and the renamed 'feel' variable
X <- lapply(X, function(x) dcast(x, UniqueID ~ variable))

#merge it back onto the original dataset
for(i in 1:length(X)){
  data <- merge(data, X[[i]], by="UniqueID", all=T)
}

1 个答案:

答案 0 :(得分:0)

我将它们与list元素变量结合在一起,并从long变为宽格式。现在你只需要将一组变量名称更改为字符串列表(或字符串向量,我不确定),而不是列表元素中的许多名称列表。

# sample data
df1 <- data.frame(A = 1:10, B= 11:20, C = 21:30)
df2 <- data.frame(A = 5:14, B = 41:50, C = 51:60) 
listDF <- list(df1, df2)

require(reshape)
require(plyr)

# put them all in 1 long dataframe
df <- rbind.fill(listDF)

# label which list element they came from and pretty up the vars
df$listnum <- rep((1:length(listDF)),times = lapply(listDF,nrow))
names(df) <- c('id','cat','feel','listnum')

# change from long to wide
df <- reshape(df,idvar = 'id',timevar = 'listnum',direction = 'wide')

我不完全确定您如何命名变量,但上述结果包含您需要的所有信息。您只需要做names(df) <- sub()一点。并且以不同的速率循环,让R回收较短的矢量。类似的东西:

paste(rep(1:24,each = 9),1:9,sep = '.')