实施例

Question

我使用data.frame将CSV文件导入read.table()。 data.frame看起来像：

X1        X2   X3
Sample    A  
Lot      new
Name     Vol   %
Data     0.1   10
Data     0.2   20
Data     0.3   30
Sample    B  
Lot      old
Name     Vol   %
Data     0.1   50
Data     0.2   60
Data     0.3   70

我想重新组织这个data.frame，使得前3个数据点与Sample'A'和Lot'new'相关联，而后3个数据点与Sample'B'和Lot'old'相关联代替。我试图想出一种优雅的方法来做到这一点，而不需要使用for循环，或者必须使用子命令手动分割出data.frame行（即dataA = mydataframe[4:6] ）。

我最终想要的data.frame可能类似于：

A_new_Vol  A_new_%   B_old_Vol   B_old_%
  0.1        10         0.1        50
  0.2        20         0.2        60
  0.3        30         0.3        70

其中Sample，Lot，Vol和％信息包含在列名称中。

另一种可能性是让data.frame成为：

Sample   Lot   Vol   %
  A      new   0.1   10
  A      new   0.2   20
  A      new   0.3   30
  B      old   0.1   50
  B      old   0.2   60
  B      old   0.3   70

任何指针都将非常感激。谢谢！

Answer 1

假设您的数据位于df：

df <- setNames(df[-1, ], c("type", "Vol", "%"))
df.lst <- split(df, cumsum(df[, 1] == "Sample"))
do.call(
  rbind,
  lapply(df.lst, function(x) cbind(Sample=x[1, 2], Lot=x[2, 2], x[-(1:3), -1]))
)

制作（最后以dput形式提供）：

     Sample Lot Vol  %
1.5       A new 0.1 10
1.6       A new 0.2 20
1.7       A new 0.3 30
2.11      B old 0.1 50
2.12      B old 0.2 60
2.13      B old 0.3 70

如果您需要备用格式，可以选择reshape2：

library(reshape2)
df.new$id2 <- ave(1:nrow(df.new), df.new$Sample, df.new$Lot, FUN=seq_along)
dcast(
  melt(df.new, id.vars=c("Sample", "Lot", "id2")), 
  id2 ~ Sample + Lot + variable
)

产地：

  id2 A_new_Vol A_new_% B_old_Vol B_old_%
1   1       0.1      10       0.1      50
2   2       0.2      20       0.2      60
3   3       0.3      30       0.3      70

基本上，您需要添加一个id列，再次融化，以便您真正处于“长”格式，然后dcast进行宽幅格式化。

或者，如果你想要基础R你可以做同样的事情（由Ananda提供）：

df.new <- within(df.new, {
  ID <- ave(rep(1, nrow(df.new)), Sample, FUN = seq_along)
  Time <- paste(Sample, Lot, sep = "_")
})
reshape(df.new, direction = "wide", idvar="ID", timevar="Time", drop=c("Sample", "Lot"))

导致：

    ID Vol.A_new %.A_new Vol.B_old %.B_old
1.4  1       0.1      10       0.1      50
1.5  2       0.2      20       0.2      60
1.6  3       0.3      30       0.3      70

df.new以：

开头

structure(list(Sample = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), Lot = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("new", "old"), class = "factor"), Vol = c(0.1, 0.2, 0.3, 0.1, 0.2, 0.3), "%" = c(10L, 20L, 30L, 50L, 60L, 70L), id2 = c(1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("Sample", "Lot", "Vol", "%", "id2"), row.names = c("1.5", "1.6", "1.7", "2.11", "2.12", "2.13"), class = "data.frame")

Answer 2

prev_sample_indices <- which(df[[1]] == 'Sample')
sample_indices <- c(prev_sample_indices[-1], nrow(df) + 1)

df <- Reduce(cbind, lapply(seq_along(sample_indices), function(index) {
  sample_index <- prev_sample_indices[index]
  label <- df[sample_index, 2] # A or B
  lot <- df[sample_index + 1, 2] # old or new
  data.frame(structure(lapply(2:3, function(i)
    df[seq(sample_index + 3, sample_indices[index] - 1), i]
  ), .Names = paste0(label, "_", lot, "_", c("Vol", "pct"))))                       
}))

实施例

 df <- data.frame(c("Sample", "Lot", "Name", "Data", "Data", "Data", "Sample", "Lot", "Name", "Data", "Data", "Data"), c("A", "new", "Vol", (1:3)/10, "B", "old", "Vol", (1:3)/10), c("", "", "%", (1:3)*10, "", "", "%", (5:7)*10))
 colnames(df) <- paste0("X", 1:3)
 # Run above code
 print(df)
 #   A_new_Vol A_new_pct B_old_Vol B_old_pct
 # 1       0.1        10       0.1        50
 # 2       0.2        20       0.2        60
 # 3       0.3        30       0.3        70

请注意，您无法在data.frame的列名中使用%。它会转换为.。

如何重新组织R中的数据框

2 个答案:

实施例