Question

在KDB中，我有下表：

q)tab:flip `items`sales`prices!(`nut`bolt`cam`cog`bolt`screw;6 8 0 3 0n 0n;10  20 15 20 0n 0n)
q)tab

items sales prices
------------------
nut   6     10
bolt  8     20
cam   0     15
cog   3     20
bolt
screw

在此表中，有2个重复项（bolt）。但是，因为第一个'bolt'包含更多信息。我想删除'较小'的螺栓。

最终结果：

items sales prices
------------------
nut   6     10
bolt  8     20
cam   0     15
cog   3     20
screw

据我了解，如果我使用'distinct'函数，它的确定性不确定？

Answer 1

一种方法是按项目填写，然后bolt将继承之前的值。

q)update fills sales,fills prices by items from tab
items sales prices
------------------
nut   6     10
bolt  8     20
cam   0     15
cog   3     20
bolt  8     20
screw

这也可以通过函数形式完成，您可以传递表格和by列：

{![x;();(!). 2#enlist(),y;{x!fills,/:x}cols[x]except y]}[tab;`items]

如果＆＃34;更多信息＆＃34;意味着＆＃34;至少为空的＆＃34;那么你可以计算每一行中的空值数，并且只返回包含最少的项的那些行：

q)select from @[tab;`n;:;sum each null tab] where n=(min;n)fby items
items sales prices n
--------------------
nut   6     10     0
bolt  8     20     0
cam   0     15     0
cog   3     20     0
screw              2

虽然不推荐这种方法，因为它需要处理行而不是列。

Answer 2

因为这两行包含不同的数据，所以它们被认为是不同的。

这取决于您如何定义＆＃34;更多信息＆＃34;。您可能需要提供更多示例，但有一些可能性：

删除销售额为零的行

q)delete from tab where null sales
items sales prices
------------------
nut   6     10    
bolt  8     20    
cam   0     15    
cog   3     20

检索每个项目的最大销售额值

q)select from tab where (sales*prices) = (max;sales*prices) fby items
items sales prices
------------------
nut   6     10    
bolt  8     20    
cam   0     15    
cog   3     20

删除较小的副本

2 个答案: