如何重新组织R中的数据框

时间:2014-03-28 20:24:57

标签: r csv dataframe

我使用data.frame将CSV文件导入read.table()data.frame看起来像:

X1        X2   X3
Sample    A  
Lot      new
Name     Vol   %
Data     0.1   10
Data     0.2   20
Data     0.3   30
Sample    B  
Lot      old
Name     Vol   %
Data     0.1   50
Data     0.2   60
Data     0.3   70

我想重新组织这个data.frame,使得前3个数据点与Sample'A'和Lot'new'相关联,而后3个数据点与Sample'B'和Lot'old'相关联代替。我试图想出一种优雅的方法来做到这一点,而不需要使用for循环,或者必须使用子命令手动分割出data.frame行(即dataA = mydataframe[4:6] )。

我最终想要的data.frame可能类似于:

A_new_Vol  A_new_%   B_old_Vol   B_old_%
  0.1        10         0.1        50
  0.2        20         0.2        60
  0.3        30         0.3        70

其中Sample,Lot,Vol和%信息包含在列名称中。

另一种可能性是让data.frame成为:

Sample   Lot   Vol   %
  A      new   0.1   10
  A      new   0.2   20
  A      new   0.3   30
  B      old   0.1   50
  B      old   0.2   60
  B      old   0.3   70

任何指针都将非常感激。谢谢!

2 个答案:

答案 0 :(得分:3)

假设您的数据位于df

df <- setNames(df[-1, ], c("type", "Vol", "%"))
df.lst <- split(df, cumsum(df[, 1] == "Sample"))
do.call(
  rbind,
  lapply(df.lst, function(x) cbind(Sample=x[1, 2], Lot=x[2, 2], x[-(1:3), -1]))
)

制作(最后以dput形式提供):

     Sample Lot Vol  %
1.5       A new 0.1 10
1.6       A new 0.2 20
1.7       A new 0.3 30
2.11      B old 0.1 50
2.12      B old 0.2 60
2.13      B old 0.3 70

如果您需要备用格式,可以选择reshape2

library(reshape2)
df.new$id2 <- ave(1:nrow(df.new), df.new$Sample, df.new$Lot, FUN=seq_along)
dcast(
  melt(df.new, id.vars=c("Sample", "Lot", "id2")), 
  id2 ~ Sample + Lot + variable
)

产地:

  id2 A_new_Vol A_new_% B_old_Vol B_old_%
1   1       0.1      10       0.1      50
2   2       0.2      20       0.2      60
3   3       0.3      30       0.3      70

基本上,您需要添加一个id列,再次融化,以便您真正处于“长”格式,然后dcast进行宽幅格式化。

或者,如果你想要基础R你可以做同样的事情(由Ananda提供):

df.new <- within(df.new, {
  ID <- ave(rep(1, nrow(df.new)), Sample, FUN = seq_along)
  Time <- paste(Sample, Lot, sep = "_")
})
reshape(df.new, direction = "wide", idvar="ID", timevar="Time", drop=c("Sample", "Lot"))

导致:

    ID Vol.A_new %.A_new Vol.B_old %.B_old
1.4  1       0.1      10       0.1      50
1.5  2       0.2      20       0.2      60
1.6  3       0.3      30       0.3      70

df.new以:

开头
structure(list(Sample = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), Lot = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("new", "old"), class = "factor"), Vol = c(0.1, 0.2, 0.3, 0.1, 0.2, 0.3), "%" = c(10L, 20L, 30L, 50L, 60L, 70L), id2 = c(1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("Sample", "Lot", "Vol", "%", "id2"), row.names = c("1.5", "1.6", "1.7", "2.11", "2.12", "2.13"), class = "data.frame")

答案 1 :(得分:0)

prev_sample_indices <- which(df[[1]] == 'Sample')
sample_indices <- c(prev_sample_indices[-1], nrow(df) + 1)

df <- Reduce(cbind, lapply(seq_along(sample_indices), function(index) {
  sample_index <- prev_sample_indices[index]
  label <- df[sample_index, 2] # A or B
  lot <- df[sample_index + 1, 2] # old or new
  data.frame(structure(lapply(2:3, function(i)
    df[seq(sample_index + 3, sample_indices[index] - 1), i]
  ), .Names = paste0(label, "_", lot, "_", c("Vol", "pct"))))                       
}))

实施例

 df <- data.frame(c("Sample", "Lot", "Name", "Data", "Data", "Data", "Sample", "Lot", "Name", "Data", "Data", "Data"), c("A", "new", "Vol", (1:3)/10, "B", "old", "Vol", (1:3)/10), c("", "", "%", (1:3)*10, "", "", "%", (5:7)*10))
 colnames(df) <- paste0("X", 1:3)
 # Run above code
 print(df)
 #   A_new_Vol A_new_pct B_old_Vol B_old_pct
 # 1       0.1        10       0.1        50
 # 2       0.2        20       0.2        60
 # 3       0.3        30       0.3        70

请注意,您无法在data.frame的列名中使用%。它会转换为.