我有两个data.frames,其中一个包含一式三份完成的多个实验的特定顺序(DF1设计表);另一个包含这些实验的结果(一式三份,DF2结果表)。第一个数据帧具有随机的实验顺序,结果表具有不同的顺序。
DF1的前六列包含实验因素,例如温度,试剂当量等......结果表DF2也有相同的六列以及包含实验结果的其他列。 ,例如产量,各种试剂的转化等......
表格的不同之处在于行数。结果表的行数比设计表少三个。
如何计算这两个表格,以便将结果附加到设计中,以便设计表格中的实验参数与实验表中的相应结果相匹配。
DF1
T1 A1 B1
T2 A1 B1
T1 A2 B1
T2 A2 B1
T1 A1 B2
T2 A1 B2
T1 A2 B2
T2 A2 B2
但一式三份。
DF2
T1 A2 B2 1
T1 A2 B1 3
T2 A2 B1 3
T1 A1 B1 1
T2 A1 B2 2
T2 A2 B2 2
T2 A1 B1 2
再次一式三份,注意到少了一行。请注意,结果列的数量多于显示的列数。
关于所有这些工作的要点:我正在研究是否可以将RcmdrPlugin.DoE软件包应用于某些实际数据。
关于我的尝试......好吧,我考虑过使用sapply,cbind和ifelse与逻辑函数
sapply(
DF3 <- ifelse( DF1[,1] == DF2[,1] | DF1[,2] == DF2[,2] | DF2[,3] == DF2[,3],
cbind(DF1, DF2[,3]), NA)
)
我在这段代码中遇到了NA的问题。但在我到达NA之前,我发现我有一个参数'FUN'缺少错误。
我认为我要么偏离标准,要么非常接近答案,但两者中的哪一个。有人能指出我正确的方向吗?
编辑...我拥有的七行数据的样本,我将标题更改为A,B,C和D,这两个数据是两个数据框架共有的。
run.no run.no.std.rp Block.ccd A B C D
C0.17 1 C0.17 0 400 147.5 5 2.675
C0.7 2 C0.7 0 450 120.0 2 4.000
C0.6 3 C0.6 0 350 175.0 2 4.000
C0.3 4 C0.3 0 450 120.0 8 4.000
C0.4 5 C0.4 0 350 120.0 8 4.000
C0.16 6 C0.16 0 350 120.0 2 1.350
C0.15 7 C0.15 0 450 120.0 2 1.350
其他data.frame包含标题A,B,C和D以及包含产量,转换和其他结果的列。我需要第一个data.frame完全如所示,yield等标记在最后。
答案 0 :(得分:5)
data.table
包(允许x [y]语法)使这项工作非常容易。假设df1
和df2
是您的data.frames:
require(data.table)
dt1 <- data.table(df1, key=c("V1","V2","V3"))
dt2 <- data.table(df2, key=c("V1","V2","V3"))
dt2[dt1]
# V1 V2 V3 V4
# 1: T1 A1 B1 1
# 2: T1 A1 B2 NA
# 3: T1 A2 B1 3
# 4: T1 A2 B2 1
# 5: T2 A1 B1 2
# 6: T2 A1 B2 2
# 7: T2 A2 B1 3
# 8: T2 A2 B2 2
为您提供所需的结果。
修改:我已经使用了您编辑过的数据,但它似乎有效。
df1 <- structure(list(V1 = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L),
.Label = c("T1", "T2"), class = "factor"),
V2 = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L),
.Label = c("A1", "A2"), class = "factor"),
V3 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
.Label = c("B1", "B2"), class = "factor")),
.Names = c("V1", "V2", "V3"),
class = "data.frame", row.names = c(NA, -8L))
df2 <- structure(list(V1 = structure(c(1L, 1L, 2L, 1L, 2L, 2L, 2L),
.Label = c("T1", "T2"), class = "factor"),
V2 = structure(c(2L, 2L, 2L, 1L, 1L, 2L, 1L),
.Label = c("A1", "A2"), class = "factor"),
V3 = structure(c(2L, 1L, 1L, 1L, 2L, 2L, 1L),
.Label = c("B1", "B2"), class = "factor"),
run.no = 1:7,
run.no.std.rp = structure(c(3L, 7L, 6L, 4L, 5L, 2L, 1L),
.Label = c("C0.15", "C0.16", "C0.17", "C0.3", "C0.4", "C0.6", "C0.7"),
class = "factor"),
Block.ccd = c(0L, 0L, 0L, 0L, 0L, 0L, 0L),
A = c(400L, 450L, 350L, 450L, 350L, 350L, 450L),
B = c(147.5, 120, 175, 120, 120, 120, 120),
C = c(5L, 2L, 2L, 8L, 8L, 2L, 2L),
D = c(2.675, 4, 4, 4, 4, 1.35, 1.35)),
.Names = c("V1", "V2", "V3", "run.no", "run.no.std.rp",
"Block.ccd", "A", "B", "C", "D"),
row.names = c("C0.17", "C0.7", "C0.6", "C0.3", "C0.4",
"C0.16", "C0.15"), class = "data.frame")
require(data.table)
dt1 <- data.table(df1, key=c("V1", "V2", "V3"))
dt2 <- data.table(df2, key=c("V1", "V2", "V3"))
dt2[dt1]
# V1 V2 V3 run.no run.no.std.rp Block.ccd A B C D
# 1: T1 A1 B1 4 C0.3 0 450 120.0 8 4.000
# 2: T1 A1 B2 NA NA NA NA NA NA NA
# 3: T1 A2 B1 2 C0.7 0 450 120.0 2 4.000
# 4: T1 A2 B2 1 C0.17 0 400 147.5 5 2.675
# 5: T2 A1 B1 7 C0.15 0 450 120.0 2 1.350
# 6: T2 A1 B2 5 C0.4 0 350 120.0 8 4.000
# 7: T2 A2 B1 3 C0.6 0 350 175.0 2 4.000
# 8: T2 A2 B2 6 C0.16 0 350 120.0 2 1.350
答案 1 :(得分:1)
您的标题提到“合并”,但您似乎没有尝试merge
功能。 (或者我错过了什么?)
以下是您的前两个示例data.frame
:
DF1 <- structure(list(T1 = c("T2", "T1", "T2", "T1", "T2", "T1", "T2"
), A1 = c("A1", "A2", "A2", "A1", "A1", "A2", "A2"), B1 = c("B1",
"B1", "B1", "B2", "B2", "B2", "B2")), .Names = c("T1", "A1",
"B1"), class = "data.frame", row.names = c(NA, -7L))
DF2 <- structure(list(T1 = c("T1", "T2", "T1", "T2", "T2", "T2"), A2 = c("A2",
"A2", "A1", "A1", "A2", "A1"), B2 = c("B1", "B1", "B1", "B2",
"B2", "B1"), X1 = c(3L, 3L, 1L, 2L, 2L, 2L)), .Names = c("T1",
"A2", "B2", "X1"), class = "data.frame", row.names = c(NA, -6L))
以下是您使用基础R中的merge
的方式。by.x
和by.y
参数应包含data.frame
s中应该具有的列的名称。 all
参数表示不要删除任何“空白”,而是用NA
填充它们。
merge(DF1, DF2,
by.x = c("T1", "A1", "B1"),
by.y = c("T1", "A2", "B2"),
all = TRUE)
# T1 A1 B1 X1
# 1 T1 A1 B1 1
# 2 T1 A1 B2 NA
# 3 T1 A2 B1 3
# 4 T1 A2 B2 NA
# 5 T2 A1 B1 2
# 6 T2 A1 B2 2
# 7 T2 A2 B1 3
# 8 T2 A2 B2 2
以下是Arun创建的两个merge
上data.frame
的结果。请注意,我们不需要指定要合并的列,因为它们具有公共列名称。
merge(df1, df2, all = TRUE)
# V1 V2 V3 run.no run.no.std.rp Block.ccd A B C D
# 1 T1 A1 B1 4 C0.3 0 450 120.0 8 4.000
# 2 T1 A1 B2 NA <NA> NA NA NA NA NA
# 3 T1 A2 B1 2 C0.7 0 450 120.0 2 4.000
# 4 T1 A2 B2 1 C0.17 0 400 147.5 5 2.675
# 5 T2 A1 B1 7 C0.15 0 450 120.0 2 1.350
# 6 T2 A1 B2 5 C0.4 0 350 120.0 8 4.000
# 7 T2 A2 B1 3 C0.6 0 350 175.0 2 4.000
# 8 T2 A2 B2 6 C0.16 0 350 120.0 2 1.350