我觉得我有一个复杂的问题(或者至少对我来说是!)。
我有一张价格表,需要从csv中读取价格,该价格看起来完全像这样:
V1 <- c("","Destination","Spain","Spain","Spain","Portugal","Portugal","Portugal","Italy","Italy","Italy")
V2 <- c("","Min_Duration",rep(c(1,3,6),3))
V3 <- c("","Max_Duration",rep(c(2,5,10),3))
V4 <- c("Full-board","Level_1",runif(9,100,200))
V5 <- c("Full-board","Level_2",runif(9,201,500))
V6 <- c("Full-board","Level_3",runif(9,501,1000))
V7 <- c("Half-board","Level_1",runif(9,100,200))
V8 <- c("Half-board","Level_2",runif(9,201,500))
V9 <- c("Half-board","Level_3",runif(9,501,1000))
Lookup_matrix <- as.data.frame(cbind(V1,V2,V3,V4,V5,V6,V7,V8))
由于它们是完全随机的,因此上表中的价格当然会有些奇怪-但我们可以忽略不计...
我也有一个这样的表:
Destination <- c("Spain", "Italy", "Portugal")
Duration <- c(2,4,8)
Level <- c(1,3,3)
Board <- c("Half-board","Half-board","Full-board")
Price <- "Empty"
Price_matrix <- as.data.frame(cbind(Destination,Duration,Level,Board,Price))
我的问题是-如何用可以在查找矩阵中找到的相应价格填充价格矩阵的“价格”列?请注意,价格矩阵的持续时间变量必须适合查找矩阵的“ Min_Duration”和“ Max_Duration”列之间的范围。
在Excel中,我将使用Index,Match公式。但是我对R感到困惑。
预先感谢, 丹
答案 0 :(得分:1)
这里有tidyverse
的可能性
首先,请注意,我重命名了您的输入对象; Price_matrix
和Lookup_matrix
都是data.frame
(不是矩阵)。
df1 <- Price_matrix
df2 <- Lookup_matrix
接下来,我们需要修复df2 = Lookup_matrix
的列名。
# Fix column names
colnames(df2) <- gsub("^_", "", apply(df2[1:2, ], 2, paste0, collapse = "_"))
df2 <- df2[-(1:2), ]
我们现在基本上进行了df1
和df2
的左连接;为了使df2
处于适当的格式,我们将数据从宽分布到长分布,为每个Price
和Board
提取Level
值,并从{{1 }}到Min_Duration
。然后我们通过Max_Duration
,Destination
,Duration
和Level
加入。
请注意,在您的示例中,Board
中Destination = Italy
中没有Level = 3
条目;因此,我们为此条目获得Lookup_matrix
。
Price = NA
答案 1 :(得分:0)
使用数据表:
library(data.table)
nms = trimws(do.call(paste, transpose(Lookup_matrix[1:2, ])))# column names
cat(do.call(paste, c(collapse="\n", Lookup_matrix[-(1:2), ])), file = "mm.csv")
# Rewrite the data in the correct format. You do not have to.
# Just doing Lookup_matrix1 = setNames(Lookup_matrix[-(1:2),],nms) is enough
# but it will not have rectified the column classes.
Lookup_matrix1 = fread("mm.csv", col.names = nms)
melt(Lookup_matrix1, 1:3)[,
c("Board", "Level") := .(sub("[.]", "-", sub("\\.Leve.*", "", variable)), sub("\\D+", "", variable))][
Price_matrix[, -5], on=c("Destination", "Board", "Level", "Min_Duration <= Duration", "Max_Duration >= Duration")]
Destination Min_Duration Max_Duration variable value Board Level
1: Spain 2 2 Half.board.Level_1 105.2304 Half-board 1
2: Italy 4 4 <NA> NA Half-board 3
3: Portugal 8 8 Full.board.Level_3 536.5132 Full-board 3