将较低节点的最大值与较高节点进行比较

时间:2015-02-02 15:32:27

标签: r statistics

我有一个相当复杂的问题。我有不同的公司和不同的买家。此外,我还有不同的产品,最多可以有15种产品。 所有产品都有Price。在我的示例中,此Price适用于不同的产品集,其名称为Product - Set 1Product - Set 6

现在我想循环遍历所有Companies并检查他们的Buyers并测试Product Sets 1 to 6(in my example)节点上所有Product - ALL的价格是否为最大价格选择CompanyBuyer

我尝试了一个例子:

> dput(sys)
structure(list(Company = c("Company 1", "Company 2", "Company 3", 
"Company 2", "Company 2", "Company 2", "Company 3", "Company 3", 
"Company 5", "Company 5", "Company 5", "Company 2", "Company 2", 
"Company 2", "Company 2", "Company 2"), Buyer = c("Buyer 1", 
"Buyer 2", "Buyer 1", "Buyer 1", "Buyer 1", "Buyer 2", "Buyer 2", 
"Buyer 1", "Buyer 3", "Buyer 1", "Buyer 3", "Buyer 2", "Buyer 2", 
"Buyer 2", "Buyer 2", "Buyer 2"), Products = c("Product - ALL", 
"Product - Set 1", "Product - Set 2", "Product - Set 1", "Product - ALL", 
"Product - ALL", "Product - ALL", "Product - Set 1", "Product - ALL", 
"Product - Set 1", "Product - Set 2", "Product - Set 2", "Product - Set 3", 
"Product - Set 4", "Product - Set 5", "Product - Set 6"), Price = c(NA, 
10L, 99L, 13L, 13L, 12L, 99L, 99L, 100L, 100L, 100L, 12L, NA, 
11L, 0L, 12L)), .Names = c("Company", "Buyer", "Products", "Price"
), row.names = c(NA, -16L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000000000100788>)
> 
> df <- sys[ (sys$Company =="Company 2" & sys$Buyer == "Buyer 2"), ]
> 
> #replace all NAs with 0
> df[is.na(df)] <- 0
> 
> #Fill control column with null
> df$ControlColumn <- "null"
> 
> if(grep("Product - ALL", df)) {
+  i <- grep("Product - ALL", df)
+  prodSet1 <- grep("Product - Set 1", df$Products)
+  prodSet2 <- grep("Product - Set 2", df$Products)
+  prodSet3 <- grep("Product - Set 3", df$Products)
+  prodSet4 <- grep("Product - Set 4", df$Products)
+  prodSet5 <- grep("Product - Set 5", df$Products)
+  prodSet6 <- grep("Product - Set 6", df$Products)
+  val <- max(df[prodSet1]$Price, df[prodSet2]$Price,df[prodSet3]$Price,df[prodSet4]$Price,df[prodSet5]$Price,df[prodSet6]$Price)
+  df[i]$Price == val
+  df[i]$ControlColumn <- (df[i]$Price == val)
+ }

但是,我正在努力为输入数据自动执行此任务。任何建议如何为这个复杂的问题自动化这个过程?

感谢您的回复

1 个答案:

答案 0 :(得分:2)

您可以更好地利用sys数据集为data.table这一事实。

首先,您可以找到具有最高价格的Products给定CompanyBuyer(我们不希望这些产品为Products - All):

max.prices <- sys[Products!='Product - ALL',.SD[which.max(Price)],by=.(Company,Buyer)]
#      Company   Buyer        Products Price
# 1: Company 2 Buyer 2 Product - Set 2    12
# 2: Company 3 Buyer 1 Product - Set 2    99
# 3: Company 2 Buyer 1 Product - Set 1    13
# 4: Company 5 Buyer 1 Product - Set 1   100
# 5: Company 5 Buyer 3 Product - Set 2   100

max.prices在进一步分析中可能对其他用途有用,因此您可能希望创建另一个数据集而不是修改max.prices

all.prods <- max.prices
all.prods[,Products:='Product - ALL']
#      Company   Buyer      Products Price
# 1: Company 2 Buyer 2 Product - ALL    12
# 2: Company 3 Buyer 1 Product - ALL    99
# 3: Company 2 Buyer 1 Product - ALL    13
# 4: Company 5 Buyer 1 Product - ALL   100
# 5: Company 5 Buyer 3 Product - ALL   100

现在,所有&#39;产品 - 全部&#39;条目可以由更新的条目替换:

result <- rbind(all.prods,sys[Products!='Product - ALL'])

下面的代码对结果进行排序并打印出来:

setkey(result,Company,Buyer)    
result
#      Company   Buyer        Products Price
#  1: Company 2 Buyer 1   Product - ALL    13
#  2: Company 2 Buyer 1 Product - Set 1    13
#  3: Company 2 Buyer 2   Product - ALL    12
#  4: Company 2 Buyer 2 Product - Set 1    10
#  5: Company 2 Buyer 2 Product - Set 2    12
#  6: Company 2 Buyer 2 Product - Set 3    NA
#  7: Company 2 Buyer 2 Product - Set 4    11
#  8: Company 2 Buyer 2 Product - Set 5     0
#  9: Company 2 Buyer 2 Product - Set 6    12
# 10: Company 3 Buyer 1   Product - ALL    99
# 11: Company 3 Buyer 1 Product - Set 2    99
# 12: Company 3 Buyer 1 Product - Set 1    99
# 13: Company 5 Buyer 1   Product - ALL   100
# 14: Company 5 Buyer 1 Product - Set 1   100
# 15: Company 5 Buyer 3   Product - ALL   100
# 16: Company 5 Buyer 3 Product - Set 2   100