KDB / Q查询匹配连续条件的行?

时间:2019-06-20 02:55:04

标签: kdb

我正在寻找在商店数据集中的交易(连续的?),这种趋势遵循的趋势是尽管一天之内有一些先前的取消,但他们最终还是完成了交易。

有效的批处理交易必须符合一组条件。

  1. 它们应该来自同一家商店。
  2. 最终应完成这些操作,即X的取消数量,但1完成。
  3. 待处理的批处理交易(已取消和完成)不应超过特定的时间范围,例如1天。
  4. 这些交易应贴有相同数量的现金,以被视为“相同”交易。
  5. 交易应按天划分,即任何待处理的批次均不应视为第二天的连续性。
  6. 金额为十的幂(即10、1000、10000)的已取消交易应忽略。

查询应保留所有符合上述条件的批次。最终表应在列batch中包含批次的总运行量,以区分它们。

初始表

shop amount status    date    
------------------------------
A    1234   Cancelled 20101010
A    1234   Cancelled 20101010
A    1234   Completed 20101010
A    1234   Cancelled 20101010
A    1234   Completed 20101011
A    1000   Completed 20101011
B    100    Cancelled 20101011
B    100    Cancelled 20101011
B    4321   Cancelled 20101011
B    4321   Cancelled 20101011
C    333    Cancelled 20101012
C    333    Completed 20101012
C    333    Completed 20101012
D    111    Cancelled 20101013
D    155    Cancelled 20101013
D    111    Completed 20101013
D    155    Completed 20101013

按天划分:

shop amount status    date    
------------------------------
A    1234   Cancelled 20101010
A    1234   Cancelled 20101010
A    1234   Completed 20101010
A    1234   Cancelled 20101010
------------------------------
A    1234   Completed 20101011
A    1000   Completed 20101011
B    100    Cancelled 20101011
B    100    Cancelled 20101011
B    4321   Cancelled 20101011
B    4321   Cancelled 20101011
------------------------------
C    333    Cancelled 20101012
C    333    Completed 20101012
C    333    Completed 20101012
------------------------------
D    111    Cancelled 20101013
D    155    Cancelled 20101013
D    111    Completed 20101013
D    155    Completed 20101013

结果表:

shop amount status    date     batch
-------------------------------------
A    1234   Cancelled 20101010   1
A    1234   Cancelled 20101010   1
A    1234   Completed 20101010   1
-------------------------------------
A    1234   Completed 20101011   2
A    1000   Completed 20101011   3
-------------------------------------
C    333    Cancelled 20101012   4
C    333    Completed 20101012   4
C    333    Completed 20101012   5
-------------------------------------
D    111    Cancelled 20101013   6
D    155    Cancelled 20101013   7
D    111    Completed 20101013   6
D    155    Completed 20101013   7

表格查询:

([] shop:`A`A`A`A`A`A`B`B`B`B`C`C`C`D`D`D`D; amount: 1234 1234 1234 1234 1234 1000 100 100 4321 4321 333 333 333 111 155 111 155; status:`Cancelled`Cancelled`Completed`Cancelled`Completed`Completed`Cancelled`Cancelled`Cancelled`Cancelled`Cancelled`Completed`Completed`Cancelled`Cancelled`Completed`Completed; date: `20101010`20101010`20101010`20101010`20101011`20101011`20101011`20101011`20101011`20101011`20101012`20101012`20101012`20101013`20101013`20101013`20101013)

说明:

  1. 第一天, A 进行了4笔交易。前三个批次具有相同的数量, [已取消->已取消->已完成] 是相同的。最后一天的交易会在一天结束时被忽略。

  2. 第二天, A 进行一笔金额为1234的交易,但不将前一天的交易作为其批次的一部分。一个完成1000的另一笔交易。 B 进行了四笔交易,但由于被 a)取消或b)十次幂而无法跟踪。

  3. 第三天, C 进行三笔相同金额的交易。这是两个批处理,因为第一个取消和完成构成了初始批处理,而最终完成的交易本身就是一个批处理。

  4. 在第四天, D 进行了四笔交易并分两批进行。请注意,此处的交易不是连续的,因为有两个金额不同的已取消交易,但都在将来完成。

表格按时间戳和日期排序,即23:59:59到00:00:00。查询不需要是单行查询,可以是写入任何临时表/变量等的多行查询。

此外,如果有一种方法可以获取每批取消的交易数,这将很有帮助。

2 个答案:

答案 0 :(得分:6)

因此,首先计算完成的批次数量。

q)n:count select from tab where status=`Completed

然后使用以下查询将批号分配给每个“已完成”行

q)btab:update batch:1+til n from tab where status=`Completed
q)btab
shop amount status    date     batch
------------------------------------
A    1234   Cancelled 20101010
A    1234   Cancelled 20101010
A    1234   Completed 20101010 1
A    1234   Cancelled 20101010
A    1234   Completed 20101011 2
A    1000   Completed 20101011 3
B    100    Cancelled 20101011
B    100    Cancelled 20101011
B    4321   Cancelled 20101011
B    4321   Cancelled 20101011
C    333    Cancelled 20101012
C    333    Completed 20101012 4
C    333    Completed 20101012 5
D    111    Cancelled 20101013
D    155    Cancelled 20101013
D    111    Completed 20101013 6
D    155    Completed 20101013 7

然后反转表格以按日期,商店和金额填充空值,然后反转并删除10的幂的所有取消(使用与特里林奇相同的逻辑)

q)ftab:reverse update fills batch by date,shop,amount from reverse btab where not (status=`Cancelled)&{x=`int$x}10 xlog amount
q)ftab
shop amount status    date     batch
------------------------------------
A    1234   Cancelled 20101010 1
A    1234   Cancelled 20101010 1
A    1234   Completed 20101010 1
A    1234   Cancelled 20101010
A    1234   Completed 20101011 2
A    1000   Completed 20101011 3
B    100    Cancelled 20101011
B    100    Cancelled 20101011
B    4321   Cancelled 20101011
B    4321   Cancelled 20101011
C    333    Cancelled 20101012 4
C    333    Completed 20101012 4
C    333    Completed 20101012 5
D    111    Cancelled 20101013 6
D    155    Cancelled 20101013 7
D    111    Completed 20101013 6
D    155    Completed 20101013 7

然后从表中选择并提取具有批号的数据

q)stab:select from ftab where batch<>0N
q)stab
shop amount status    date     batch
------------------------------------
A    1234   Cancelled 20101010 1
A    1234   Cancelled 20101010 1
A    1234   Completed 20101010 1
A    1234   Completed 20101011 2
A    1000   Completed 20101011 3
C    333    Cancelled 20101012 4
C    333    Completed 20101012 4
C    333    Completed 20101012 5
D    111    Cancelled 20101013 6
D    155    Cancelled 20101013 7
D    111    Completed 20101013 6
D    155    Completed 20101013 7
q)

最后这是一个查询,以获取每批次的取消数量

q)select numberOfCancellations:-1+count batch by batch from stab
batch| numberOfCancellations
-----| ---------------------
1    | 2
2    | 0
3    | 0
4    | 1
5    | 0
6    | 1
7    | 1

答案 1 :(得分:1)

这不是最终查询,至少是一个起点:

q)select from tab where not (status=`Cancelled)&{x=`int$x}10 xlog amount, ({raze(reverse maxs reverse@)each`Completed=x[`status] group x`amount};([]amount;status)) fby ([]date;shop)
shop amount status    date    
------------------------------
A    1234   Cancelled 20101010
A    1234   Cancelled 20101010
A    1234   Completed 20101010
A    1234   Completed 20101011
A    1000   Completed 20101011
C    333    Cancelled 20101012
C    333    Completed 20101012
C    333    Completed 20101012
D    111    Cancelled 20101013
D    155    Cancelled 20101013
D    111    Completed 20101013
D    155    Completed 20101013

批处理逻辑可以通过后续查询完成