R

时间:2015-10-17 17:38:51

标签: r performance match lookup memory-efficient

从2个对象开始:1个订单属性的数据框 - 订单号,权重和卷,以及1个列表 - 订单号的组合字符串。

attr <- data.frame(Order.No = c(111,222,333), Weight = c(20,75,50), Volume = c(10,30,25))
combn <- list(111, 222, 333, c(111,222), c(111,333), c(222,333), c(111,222,333))

目标是找到每个订单串的总重量和立方体,并且只保留权重和多维数据集约束内的组合。

我目前正在使用以下内容 -

# Lookup weights for each Order.No in the attr table
# Add up total weight for the combination and keep it if it's in the range
wgts <- lapply(combn, function(x) {
    temp <- attr$Weight[match(x, attr$Order.No)]
    temp <- sum(temp)
    temp[temp <= 50 & temp >= 20]
})
> wgts
[[1]]
[1] 20

[[2]]
numeric(0)

[[3]]
[1] 50

[[4]]
numeric(0)

[[5]]
numeric(0)

[[6]]
numeric(0)

[[7]]
numeric(0)

# Lookup volumes for each Order.No in the attr table
# Add up total volume for the combination and keep it if it's in the range
vols <- lapply(combn, function(x) {
    temp <- attr$Volume[match(x, attr$Order.No)]
    temp <- sum(temp)
    temp[temp <= 50 & temp >= 10]
})
> vols
[[1]]
[1] 10

[[2]]
[1] 30

[[3]]
[1] 25

[[4]]
[1] 40

[[5]]
[1] 35

[[6]]
numeric(0)

[[7]]
numeric(0)

然后使用mapply合并两个权重和体积列表。

# Find and keep only the rows that have both the weights and volumes within their ranges  
which(lapply(mapply(c, wgts, vols), function(x) length(x)) == 2)

# Yields position 1 and 3 which meet the subsetting conditions    
> value value 
    1     3

上面的代码查找各个订单权重和多维数据集,将它们汇总在一起,检查以确保它们在每个范围限制内,将两个列表合并在一起并仅保留权重和多维数据集都在可接受范围内的列表

我目前成功完成任务的解决方案在生产量上非常缓慢,并且无法与数百万条记录很好地扩展。使用11 MM订单组合进行查找,此过程需要约40分钟才能运行,这是不可接受的。

我正在寻求一种更有效的方法,它将大大减少产生相同输出所需的运行时间。

2 个答案:

答案 0 :(得分:4)

telnet hostname_or_ip 3306

两者都给出了

# changing names, assigning indices to order list
atdf  = data.frame(Order.No = c(111,222,333), Weight = c(20,75,50), Volume = c(10,30,25))
olist = list(111, 222, 333, c(111,222), c(111,333), c(222,333), c(111,222,333))
olist <- setNames(olist,seq_along(olist))

# defining filtering predicate:

sel_orders = function(os, mins=c(20,10), maxs=c(50,50)) {
    tot = colSums(atdf[match(os, atdf$Order.No), c("Weight","Volume")])
    all(maxs >= tot & tot >= mins)
}

# Filtering orders

olist[sapply(olist, sel_orders)]
# or 
Filter(x = olist, f = sel_orders)

改变最大值和分钟......

# $`1`
# [1] 111
# 
# $`3`
# [1] 333

答案 1 :(得分:1)

不知道这会有多快,但这是一个dplyr / tidyr解决方案。

library(dplyr)
library(tidyr)

combination = 
  data_frame(Order.No = combn) %>%
  mutate(combination_ID = 1:n()) %>%
  unnest(Order.No)

acceptable = 
  combination %>%
  left_join(attr) %>%
  group_by(combination_ID) %>%
  summarize(total_weight = sum(Weight),
         total_volume = sum(Volume)) %>%
  filter(total_weight %>% between(20, 50) &
           total_volume %>% between(10, 50) )