我使用data.frame
将CSV文件导入read.table()
。 data.frame
看起来像:
X1 X2 X3
Sample A
Lot new
Name Vol %
Data 0.1 10
Data 0.2 20
Data 0.3 30
Sample B
Lot old
Name Vol %
Data 0.1 50
Data 0.2 60
Data 0.3 70
我想重新组织这个data.frame
,使得前3个数据点与Sample'A'和Lot'new'相关联,而后3个数据点与Sample'B'和Lot'old'相关联代替。我试图想出一种优雅的方法来做到这一点,而不需要使用for循环,或者必须使用子命令手动分割出data.frame
行(即dataA = mydataframe[4:6]
)。
我最终想要的data.frame
可能类似于:
A_new_Vol A_new_% B_old_Vol B_old_%
0.1 10 0.1 50
0.2 20 0.2 60
0.3 30 0.3 70
其中Sample,Lot,Vol和%信息包含在列名称中。
另一种可能性是让data.frame
成为:
Sample Lot Vol %
A new 0.1 10
A new 0.2 20
A new 0.3 30
B old 0.1 50
B old 0.2 60
B old 0.3 70
任何指针都将非常感激。谢谢!
答案 0 :(得分:3)
假设您的数据位于df
:
df <- setNames(df[-1, ], c("type", "Vol", "%"))
df.lst <- split(df, cumsum(df[, 1] == "Sample"))
do.call(
rbind,
lapply(df.lst, function(x) cbind(Sample=x[1, 2], Lot=x[2, 2], x[-(1:3), -1]))
)
制作(最后以dput
形式提供):
Sample Lot Vol %
1.5 A new 0.1 10
1.6 A new 0.2 20
1.7 A new 0.3 30
2.11 B old 0.1 50
2.12 B old 0.2 60
2.13 B old 0.3 70
如果您需要备用格式,可以选择reshape2
:
library(reshape2)
df.new$id2 <- ave(1:nrow(df.new), df.new$Sample, df.new$Lot, FUN=seq_along)
dcast(
melt(df.new, id.vars=c("Sample", "Lot", "id2")),
id2 ~ Sample + Lot + variable
)
产地:
id2 A_new_Vol A_new_% B_old_Vol B_old_%
1 1 0.1 10 0.1 50
2 2 0.2 20 0.2 60
3 3 0.3 30 0.3 70
基本上,您需要添加一个id列,再次融化,以便您真正处于“长”格式,然后dcast
进行宽幅格式化。
或者,如果你想要基础R你可以做同样的事情(由Ananda提供):
df.new <- within(df.new, {
ID <- ave(rep(1, nrow(df.new)), Sample, FUN = seq_along)
Time <- paste(Sample, Lot, sep = "_")
})
reshape(df.new, direction = "wide", idvar="ID", timevar="Time", drop=c("Sample", "Lot"))
导致:
ID Vol.A_new %.A_new Vol.B_old %.B_old
1.4 1 0.1 10 0.1 50
1.5 2 0.2 20 0.2 60
1.6 3 0.3 30 0.3 70
df.new
以:
structure(list(Sample = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), Lot = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("new", "old"), class = "factor"), Vol = c(0.1, 0.2, 0.3, 0.1, 0.2, 0.3), "%" = c(10L, 20L, 30L, 50L, 60L, 70L), id2 = c(1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("Sample", "Lot", "Vol", "%", "id2"), row.names = c("1.5", "1.6", "1.7", "2.11", "2.12", "2.13"), class = "data.frame")
答案 1 :(得分:0)
prev_sample_indices <- which(df[[1]] == 'Sample')
sample_indices <- c(prev_sample_indices[-1], nrow(df) + 1)
df <- Reduce(cbind, lapply(seq_along(sample_indices), function(index) {
sample_index <- prev_sample_indices[index]
label <- df[sample_index, 2] # A or B
lot <- df[sample_index + 1, 2] # old or new
data.frame(structure(lapply(2:3, function(i)
df[seq(sample_index + 3, sample_indices[index] - 1), i]
), .Names = paste0(label, "_", lot, "_", c("Vol", "pct"))))
}))
df <- data.frame(c("Sample", "Lot", "Name", "Data", "Data", "Data", "Sample", "Lot", "Name", "Data", "Data", "Data"), c("A", "new", "Vol", (1:3)/10, "B", "old", "Vol", (1:3)/10), c("", "", "%", (1:3)*10, "", "", "%", (5:7)*10))
colnames(df) <- paste0("X", 1:3)
# Run above code
print(df)
# A_new_Vol A_new_pct B_old_Vol B_old_pct
# 1 0.1 10 0.1 50
# 2 0.2 20 0.2 60
# 3 0.3 30 0.3 70
请注意,您无法在data.frame的列名中使用%
。它会转换为.
。