下面更新
:原始
我正在尝试找到最优雅(简单和简洁)的方法,根据匹配另一个数据框中的两列来替换某些列的值。
这是包含我想要替换的列的表(基于它们包含的值)。
> cost.table
Identifier Phase.0.Difficulty Phase.1.Complexity Phase.2.Complexity Phase.3.Complexity Phase.4.Complexity Phase.5.Complexity
1 FS1 Low Low Low Medium Medium High
2 FS2 High High High Medium Medium Medium
3 FS3 High Low Low High High High
4 FS4 High Medium Medium Medium Medium Medium
5 FS5 High Medium Medium High Medium Medium
Phase.6.Complexity Transaction.Feasibility Approach
1 High Medium B
2 Medium Medium I
3 High Medium B
4 Medium Medium I
5 Medium Medium B
以下是我希望用来查找正确替换值的查找表。
> cost.approach.difficulty
Approach Difficulty Phase 0 Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6
1 B High 18102.778 29481.67 29481.67 11822.222 30737.78 21634.67 12768.00
2 B Low 3860.694 15978.47 11175.69 7448.000 12768.00 11467.56 11467.56
3 B Medium 5323.694 24974.44 15184.17 9221.333 15368.89 12768.00 12768.00
4 I High 18102.778 74184.44 29481.67 44747.111 69160.00 45249.56 32245.11
5 I Low 3860.694 26008.89 11175.69 16551.111 35910.00 16876.22 14275.33
6 I Medium 5323.694 41156.11 15184.17 22373.556 44776.67 23378.44 16876.22
7 RV High 18102.778 28373.33 29481.67 44747.111 69160.00 45249.56 32245.11
8 RV Low 3860.694 14870.14 11175.69 16551.111 44776.67 16876.22 14275.33
9 RV Medium 5323.694 22757.78 15184.17 22373.556 44776.67 23378.44 16876.22
我正在尝试找到一个简单的解决方案,在cost.approach.difficulty表中查找“接近”和“难度”的相应值。
所以例如,在cost.table中,我想要第一行,Phase.0.Difficulty,用3860.694代替(因为它是'B'方法并且难度低。
有没有人有一个优雅,简单的解决方案来查找基于两个(或更多列)的值并沿多个列替换值?
谢谢,
安德鲁
更新 -
有两个与使用合并相关的建议答案。我的目标是找到一个更简洁,简洁,优雅的解决方案。这是迄今为止我提出的最好的方法:
cost.approach.difficulty$Phase.0[match(paste(cost.table$Approach, cost.table$Phase.0.Difficulty), paste(cost.approach.difficulty$Approach, cost.approach.difficulty$Difficulty))]
这个解决方案的问题是我需要提前知道列名,但仍然看起来像是黑客。任何人都有一个更简洁的解决方案?
答案 0 :(得分:4)
如果您希望这适用于可变数量的列,我建议将您的成本表和查找表重新整理为更标准化的格式。
首先,如果您以可重复的格式提供数据,那么回答这个问题会更容易:
# Create the example data
cost.table <- data.frame(
"Identifier" = c("FS1", "FS2", "FS3", "FS4", "FS5"),
"Phase.0.Difficulty" = c("Low", "High", "High", "High", "High"),
"Phase.1.Complexity" = c("Low", "High", "Low", "Medium", "Medium"),
"Phase.2.Complexity" = c("Low", "High", "Low", "Medium", "Medium"),
"Phase.3.Complexity" = c("Medium", "Medium", "High", "Medium", "High"),
"Phase.4.Complexity" = c("Medium", "Medium", "High", "Medium", "Medium"),
"Phase.5.Complexity" = c("High", "Medium", "High", "Medium", "Medium"),
"Phase.6.Complexity" = c("High", "Medium", "High", "Medium", "Medium"),
"Transaction.Feasibility" = c("Medium", "Medium", "Medium", "Medium", "Medium"),
"Approach" = c("B", "I", "B", "I", "B"),
stringsAsFactors = FALSE)
cost.approach.difficulty <- data.frame(
"Approach" = c("B", "B", "B", "I", "I", "I", "RV", "RV", "RV"),
"Difficulty" = c("High", "Low", "Medium", "High", "Low", "Medium", "High", "Low", "Medium"),
"Phase.0" = c(18102.778, 3860.694, 5323.694, 18102.778, 3860.694, 5323.694, 18102.778, 3860.694, 5323.694),
"Phase.1" = c(29481.67,15978.47, 24974.44, 74184.44, 26008.89, 41156.11, 28373.33, 14870.14, 22757.78),
"Phase.2" = c(29481.67, 11175.69, 15184.17, 29481.67, 11175.69, 15184.17, 29481.67, 11175.69, 15184.17),
"Phase.3" = c(11822.222, 7448, 9221.333, 44747.111, 16551.111, 22373.556, 44747.111, 16551.111, 22373.556),
"Phase.4" = c(30737.78, 12768, 15368.89, 69160, 35910, 44776.67, 69160, 44776.67, 44776.67),
"Phase.5" = c(21634.67, 11467.56, 12768, 45249.56, 16876.22, 23378.44, 45249.56, 16876.22, 23378.44),
"Phase.6" = c(12768, 11467.56, 12768, 32245.11, 14275.33, 16876.22, 32245.11, 14275.33, 16876.22),
stringsAsFactors = FALSE)
重新创建示例数据后,我使用了melt.data.frame
包中的reshape2
函数:
# Reshape the data
require(reshape2)
cost.table <- melt(cost.table, id.vars = c("Identifier", "Approach"),
value.name = "Size")
cost.table$Phase <- gsub("(\\w+\\.\\d+)\\.(\\w+)", "\\1",
as.character(cost.table$variable), perl = TRUE)
cost.table$Type <- gsub("(\\w+\\.\\d+)\\.(\\w+)", "\\2",
as.character(cost.table$variable), perl = TRUE)
head(cost.table)
Identifier Approach variable Size Phase Type
1 FS1 B Phase.0.Difficulty Low Phase.0 Difficulty
2 FS2 I Phase.0.Difficulty High Phase.0 Difficulty
3 FS3 B Phase.0.Difficulty High Phase.0 Difficulty
4 FS4 I Phase.0.Difficulty High Phase.0 Difficulty
5 FS5 B Phase.0.Difficulty High Phase.0 Difficulty
6 FS1 B Phase.1.Complexity Low Phase.1 Complexity
cost.approach.difficulty <- melt(cost.approach.difficulty,
id.vars = c("Difficulty", "Approach"), variable.name = "Phase")
cost.approach.difficulty$Phase <- as.character(cost.approach.difficulty$Phase)
cost.approach.difficulty$Type <- "Difficulty"
colnames(cost.approach.difficulty)[
colnames(cost.approach.difficulty) == "Difficulty"] <- "Size"
head(cost.approach.difficulty)
Size Approach Phase value Type
1 High B Phase.0 18102.778 Difficulty
2 Low B Phase.0 3860.694 Difficulty
3 Medium B Phase.0 5323.694 Difficulty
4 High I Phase.0 18102.778 Difficulty
5 Low I Phase.0 3860.694 Difficulty
6 Medium I Phase.0 5323.694 Difficulty
两张表格都是标准格式后,您可以拨打merge
:
cost.table.filled <- merge(cost.table, cost.approach.difficulty,
by = c("Approach", "Size", "Phase", "Type"), all.x = TRUE, all.y = FALSE)
然后,如果您没有查找某些列的值,则可以重新插入原始值(否则最终会产生一堆NAs):
cost.table.filled$value[is.na(cost.table.filled$value)] <-
cost.table.filled$Size[is.na(cost.table.filled$value)]
然后你可以dcast
将这个东西重新变成原始格式:
cost.table.final <- dcast(cost.table.filled, Identifier + Approach ~ Phase + Type)
head(cost.table.final)
Identifier Approach Phase.0_Difficulty Phase.1_Complexity Phase.2_Complexity Phase.3_Complexity Phase.4_Complexity Phase.5_Complexity Phase.6_Complexity Transaction.Feasibility_Transaction.Feasibility
1 FS1 B 3860.694 Low Low Medium Medium High High Medium
2 FS2 I 18102.778 High High Medium Medium Medium Medium Medium
3 FS3 B 18102.778 Low Low High High High High Medium
4 FS4 I 18102.778 Medium Medium Medium Medium Medium Medium Medium
5 FS5 B 18102.778 Medium Medium High Medium Medium Medium Medium
要替换所有列,我会melt
每个查找表,然后cbind
将它们一起放入一个查找表中。这样,您只需拨打一次merge
,就不必担心更换NAs。
答案 1 :(得分:0)
在这种情况下,merge
应该可以解决问题:
cost.table <- merge(
x = cost.table,
y = cost.approach.difficulty[c("Approach", "Difficulty", "Phase.0")],
by.x = c("Phase.0.Difficulty", "Approach"),
by.y = c("Difficulty", "Approach"), sort = FALSE
)
cost.table$Phase.0.Difficulty <- NULL
names(cost.table)[names(cost.table) == "Phase.0"] <- "Phase.0.Difficulty"
cost.table
Approach Identifier Phase.1.Complexity Phase.2.Complexity Phase.3.Complexity Phase.4.Complexity Phase.5.Complexity Phase.6.Complexity Transaction.Feasibility Phase.0.Difficulty
1 B FS1 Low Low Medium Medium High High Medium 3860.694
2 I FS2 High High Medium Medium Medium Medium Medium 18102.778
3 I FS4 Medium Medium Medium Medium Medium Medium Medium 18102.778
4 B FS3 Low Low High High High High Medium 18102.778
5 B FS5 Medium Medium High Medium Medium Medium Medium 18102.778
答案 2 :(得分:0)
最简单的答案似乎是:
下面的代码用一行完成多列查找。
cost.approach.difficulty$Phase.0[match(paste(cost.table$Approach,
cost.table$Phase.0.Difficulty), paste(cost.approach.difficulty$Approach,
cost.approach.difficulty$Difficulty))]
要遍历多个列,for循环可以正常工作。
不幸的是,我希望有一个本机解决方案可能采用了一个列向量并将它们组合起来进行查找,但我还没有找到它。我将检查其他包,看看是否存在这样的函数。