我有一个相当复杂的问题。我有不同的公司和不同的买家。此外,我还有不同的产品,最多可以有15种产品。
所有产品都有Price
。在我的示例中,此Price
适用于不同的产品集,其名称为Product - Set 1
到Product - Set 6
。
现在我想循环遍历所有Companies
并检查他们的Buyers
并测试Product Sets 1 to 6(in my example)
节点上所有Product - ALL
的价格是否为最大价格选择Company
和Buyer
。
我尝试了一个例子:
> dput(sys)
structure(list(Company = c("Company 1", "Company 2", "Company 3",
"Company 2", "Company 2", "Company 2", "Company 3", "Company 3",
"Company 5", "Company 5", "Company 5", "Company 2", "Company 2",
"Company 2", "Company 2", "Company 2"), Buyer = c("Buyer 1",
"Buyer 2", "Buyer 1", "Buyer 1", "Buyer 1", "Buyer 2", "Buyer 2",
"Buyer 1", "Buyer 3", "Buyer 1", "Buyer 3", "Buyer 2", "Buyer 2",
"Buyer 2", "Buyer 2", "Buyer 2"), Products = c("Product - ALL",
"Product - Set 1", "Product - Set 2", "Product - Set 1", "Product - ALL",
"Product - ALL", "Product - ALL", "Product - Set 1", "Product - ALL",
"Product - Set 1", "Product - Set 2", "Product - Set 2", "Product - Set 3",
"Product - Set 4", "Product - Set 5", "Product - Set 6"), Price = c(NA,
10L, 99L, 13L, 13L, 12L, 99L, 99L, 100L, 100L, 100L, 12L, NA,
11L, 0L, 12L)), .Names = c("Company", "Buyer", "Products", "Price"
), row.names = c(NA, -16L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000000000100788>)
>
> df <- sys[ (sys$Company =="Company 2" & sys$Buyer == "Buyer 2"), ]
>
> #replace all NAs with 0
> df[is.na(df)] <- 0
>
> #Fill control column with null
> df$ControlColumn <- "null"
>
> if(grep("Product - ALL", df)) {
+ i <- grep("Product - ALL", df)
+ prodSet1 <- grep("Product - Set 1", df$Products)
+ prodSet2 <- grep("Product - Set 2", df$Products)
+ prodSet3 <- grep("Product - Set 3", df$Products)
+ prodSet4 <- grep("Product - Set 4", df$Products)
+ prodSet5 <- grep("Product - Set 5", df$Products)
+ prodSet6 <- grep("Product - Set 6", df$Products)
+ val <- max(df[prodSet1]$Price, df[prodSet2]$Price,df[prodSet3]$Price,df[prodSet4]$Price,df[prodSet5]$Price,df[prodSet6]$Price)
+ df[i]$Price == val
+ df[i]$ControlColumn <- (df[i]$Price == val)
+ }
但是,我正在努力为输入数据自动执行此任务。任何建议如何为这个复杂的问题自动化这个过程?
感谢您的回复
答案 0 :(得分:2)
您可以更好地利用sys
数据集为data.table
这一事实。
首先,您可以找到具有最高价格的Products
给定Company
和Buyer
(我们不希望这些产品为Products - All
):
max.prices <- sys[Products!='Product - ALL',.SD[which.max(Price)],by=.(Company,Buyer)]
# Company Buyer Products Price
# 1: Company 2 Buyer 2 Product - Set 2 12
# 2: Company 3 Buyer 1 Product - Set 2 99
# 3: Company 2 Buyer 1 Product - Set 1 13
# 4: Company 5 Buyer 1 Product - Set 1 100
# 5: Company 5 Buyer 3 Product - Set 2 100
max.prices
在进一步分析中可能对其他用途有用,因此您可能希望创建另一个数据集而不是修改max.prices
:
all.prods <- max.prices
all.prods[,Products:='Product - ALL']
# Company Buyer Products Price
# 1: Company 2 Buyer 2 Product - ALL 12
# 2: Company 3 Buyer 1 Product - ALL 99
# 3: Company 2 Buyer 1 Product - ALL 13
# 4: Company 5 Buyer 1 Product - ALL 100
# 5: Company 5 Buyer 3 Product - ALL 100
现在,所有&#39;产品 - 全部&#39;条目可以由更新的条目替换:
result <- rbind(all.prods,sys[Products!='Product - ALL'])
下面的代码对结果进行排序并打印出来:
setkey(result,Company,Buyer)
result
# Company Buyer Products Price
# 1: Company 2 Buyer 1 Product - ALL 13
# 2: Company 2 Buyer 1 Product - Set 1 13
# 3: Company 2 Buyer 2 Product - ALL 12
# 4: Company 2 Buyer 2 Product - Set 1 10
# 5: Company 2 Buyer 2 Product - Set 2 12
# 6: Company 2 Buyer 2 Product - Set 3 NA
# 7: Company 2 Buyer 2 Product - Set 4 11
# 8: Company 2 Buyer 2 Product - Set 5 0
# 9: Company 2 Buyer 2 Product - Set 6 12
# 10: Company 3 Buyer 1 Product - ALL 99
# 11: Company 3 Buyer 1 Product - Set 2 99
# 12: Company 3 Buyer 1 Product - Set 1 99
# 13: Company 5 Buyer 1 Product - ALL 100
# 14: Company 5 Buyer 1 Product - Set 1 100
# 15: Company 5 Buyer 3 Product - ALL 100
# 16: Company 5 Buyer 3 Product - Set 2 100