在data.table中查找以前较大的出现次数

时间:2018-01-02 20:32:00

标签: r data.table

我有一个大型数据文件,其中包含许多引用的不同日期和数量。每一行都是一个交易,具有日期和数量。我需要找出低于阈值的交易是否先于更大的交易(就数量而言)。我已经实现了这个目标,但却无法想到一个不太复杂的方法,我确信这个方法存在。我很欣赏任何提示。下面是一个完全可重现的例子:

# load required package
require(data.table)

# make it fully reproducible
set.seed(1)
a <- data.table(ref = sample(LETTERS[1:10], 300, TRUE), dates = sample(seq(as.Date("2017-08-01"), as.Date("2017-12-01"), "day"), 300, TRUE), qty = sample(1:500, 300, TRUE))

# Compute some intermediate tables
#   First one has all records below the threshold (20) with their dates
temp1 <- a[, .(dates, qLess = qty < 20, qty), by = ref][qLess == TRUE,]

#   Second one has all records above threshold with minimum dates
temp2 <- a[, .(qGeq = qty >= 20, dates), by = ref][qGeq == TRUE,][, min(dates), by = ref]

# Join both tables on ref, filter those below the threshold and filter the ones that are actually preceded (prec) by a larger order. THIS IS THE EXPECTED RESULT
temp1[temp2, on = "ref"][, prec := V1 < dates][qLess == TRUE,][prec == TRUE,]

预期结果将至少具有参考价值,而不是之前或之后,但最好是数量和日期(对于低于阈值的交易)和前一个日期(如提供的示例中所示)。

2 个答案:

答案 0 :(得分:2)

仅使用 Browser("Edge").Page("Loan#").WebButton("LoanConditions").Click Browser("Edge").Page("Loan#).GetROProperty("url") Result = Browser("Edge").Page("Loan#").GetROProperty("url") replace (Result,"abc123","xyz789") Systemutil.Run "Chrome.exe", "Result" 非等连接可能性的另一种方法:

data.table

给出:

setorder(a, ref, dates)
a[qty < 20][a[qty >= 20]
            , on = .(ref, dates > dates)
            , prev.big.date := i.dates, by = .EACHI][]

答案 1 :(得分:0)

这非常简单。我们设置密钥按ref和日期排序,然后用1标记“大”订单,为小订单设置大订单前面的NA和大订单的日期,然后填写大订单向前约会。结果包含每个订单的最新大订单,如果没有先前的大订单,则为缺失值。

setkey(a, ref, dates)
a[, is_big := (qty >= 20) + 0L]
a[is_big == 1, preceding_big_date := dates]
a[, preceding_big_date := zoo::na.locf(preceding_big_date), by = ref]
new_result = a[is_big == 0, ]