在R中的矩阵中使用多个变量进行范围查找

时间:2018-07-22 23:07:22

标签: r lookup

我觉得我有一个复杂的问题(或者至少对我来说是!)。

我有一张价格表,需要从csv中读取价格,该价格看起来完全像这样:

V1 <- c("","Destination","Spain","Spain","Spain","Portugal","Portugal","Portugal","Italy","Italy","Italy")
V2 <- c("","Min_Duration",rep(c(1,3,6),3))
V3 <- c("","Max_Duration",rep(c(2,5,10),3))
V4 <- c("Full-board","Level_1",runif(9,100,200))
V5 <- c("Full-board","Level_2",runif(9,201,500))
V6 <- c("Full-board","Level_3",runif(9,501,1000))
V7 <- c("Half-board","Level_1",runif(9,100,200))
V8 <- c("Half-board","Level_2",runif(9,201,500))
V9 <- c("Half-board","Level_3",runif(9,501,1000))
Lookup_matrix <- as.data.frame(cbind(V1,V2,V3,V4,V5,V6,V7,V8))

由于它们是完全随机的,因此上表中的价格当然会有些奇怪-但我们可以忽略不计...

我也有一个这样的表:

Destination <- c("Spain", "Italy", "Portugal")
Duration <- c(2,4,8)
Level <- c(1,3,3)
Board <- c("Half-board","Half-board","Full-board")
Price <- "Empty"
Price_matrix <- as.data.frame(cbind(Destination,Duration,Level,Board,Price))

我的问题是-如何用可以在查找矩阵中找到的相应价格填充价格矩阵的“价格”列?请注意,价格矩阵的持续时间变量必须适合查找矩阵的“ Min_Duration”和“ Max_Duration”列之间的范围。

在Excel中,我将使用Index,Match公式。但是我对R感到困惑。

预先感谢, 丹

2 个答案:

答案 0 :(得分:1)

这里有tidyverse的可能性

首先,请注意,我重命名了您的输入对象; Price_matrixLookup_matrix都是data.frame(不是矩阵)。

df1 <- Price_matrix
df2 <- Lookup_matrix

接下来,我们需要修复df2 = Lookup_matrix的列名。

# Fix column names
colnames(df2) <- gsub("^_", "", apply(df2[1:2, ], 2, paste0, collapse = "_"))
df2 <- df2[-(1:2), ]

我们现在基本上进行了df1df2的左连接;为了使df2处于适当的格式,我们将数据从宽分布到长分布,为每个PriceBoard提取Level值,并从{{1 }}到Min_Duration。然后我们通过Max_DurationDestinationDurationLevel加入。

请注意,在您的示例中,BoardDestination = Italy中没有Level = 3条目;因此,我们为此条目获得Lookup_matrix

Price = NA

答案 1 :(得分:0)

使用数据表:

library(data.table)

nms = trimws(do.call(paste, transpose(Lookup_matrix[1:2, ])))# column names

cat(do.call(paste, c(collapse="\n", Lookup_matrix[-(1:2), ])), file = "mm.csv") 
  # Rewrite the data in the correct format. You do not have to.
  # Just doing Lookup_matrix1 = setNames(Lookup_matrix[-(1:2),],nms) is enough 
  # but it will not have rectified the column classes. 

Lookup_matrix1 = fread("mm.csv", col.names = nms)  

melt(Lookup_matrix1, 1:3)[,
        c("Board", "Level") := .(sub("[.]", "-", sub("\\.Leve.*", "", variable)), sub("\\D+", "", variable))][
        Price_matrix[, -5], on=c("Destination", "Board", "Level", "Min_Duration <= Duration", "Max_Duration >= Duration")]

  Destination Min_Duration Max_Duration           variable    value      Board Level
1:       Spain            2            2 Half.board.Level_1 105.2304 Half-board     1
2:       Italy            4            4               <NA>       NA Half-board     3
3:    Portugal            8            8 Full.board.Level_3 536.5132 Full-board     3