我需要一个在我的EAV表中返回“类似”产品的查询:
1)共享至少一个类似的属性
2)没有与产品不同的属性
例如
ProductID Attribute Value
1 Prop1 1
1 Prop2 2
2 Prop1 1
3 Prop1 1
3 Prop2 3
在此示例中说,搜索类似于产品ID 1(Prop1:1和Prop2:2)的产品。产品2将被退回,因为Prop1为1,但产品3不合适,因为Prop2不同。等等。
每个产品都有可变数量的属性,因此无法为每个属性加入表。目前我正在连接道具列表以构建动态SQL“where”,但我找不到一个好的(快速?)SQL语句来执行此操作。
也许我花了太多时间专注于这个问题,但是我无法摆脱这种感觉,我错过了一个明显的方法来做到这一点......
答案 0 :(得分:3)
当遇到这类问题时,我使用TDQD - 测试驱动的查询设计。
请注意,如果您为表格命名,这对每个人都有帮助!
SELECT a.ProductID, COUNT(*) AS matches
FROM EAV_Table AS a
JOIN EAV_Table AS b
ON a.Attribute = b.Attribute AND a.value = b.value
WHERE a.ProductID != 1
AND b.ProductID = 1
GROUP BY a.ProductID
这显然不会列出任何计数为0的产品,这很好。
SELECT c.ProductID, COUNT(*) AS matches
FROM EAV_Table AS c
JOIN EAV_Table AS d
ON c.Attribute = d.Attribute AND c.value != d.value
WHERE c.ProductID != 1
AND d.ProductID = 1
GROUP BY c.ProductID
这也不会列出计数为0的产品,这更令人讨厌。
我们需要第一个查询中的所有产品,其中产品未在第二个查询中列出。这可以用NOT EXISTS和相关的子查询表示:
SELECT a.ProductID, COUNT(*) AS matches
FROM EAV_Table AS a
JOIN EAV_Table AS b
ON a.Attribute = b.Attribute AND a.value = b.value
WHERE a.ProductID != 1
AND b.ProductID = 1
AND NOT EXISTS
(SELECT c.ProductID
FROM EAV_Table AS c
JOIN EAV_Table AS d
ON c.Attribute = d.Attribute AND c.value != d.value
WHERE c.ProductID != 1
AND d.ProductID = 1
AND c.ProductID = a.ProductID
)
GROUP BY a.ProductID
那太难看了。它有效,但很难看。
CREATE TABLE eav_table
(
productid INTEGER NOT NULL,
attribute CHAR(5) NOT NULL,
value INTEGER NOT NULL,
PRIMARY KEY(productid, attribute, value)
);
INSERT INTO eav_table VALUES(1, "Prop1", 1);
INSERT INTO eav_table VALUES(1, "Prop2", 2);
INSERT INTO eav_table VALUES(2, "Prop1", 1);
INSERT INTO eav_table VALUES(3, "Prop1", 1);
INSERT INTO eav_table VALUES(3, "Prop2", 3);
INSERT INTO eav_table VALUES(4, "Prop1", 1);
INSERT INTO eav_table VALUES(4, "Prop3", 1);
2 1
3 1
4 1
3 1
2 1
4 1
这些是我生成的计数;更精致的演绎会将它们移除。
如果可以对其进行管理,则更好的最终查询将加入一个表,该表列出了所有产品ID,这些产品ID与产品ID 1至少具有一个匹配的属性/值对,其中一个表列出了所有产品ID零产品ID 1的分歧。
第一个查询与Pass 1中的第一个查询相同,只是我们将丢弃结果集中的计数。
SELECT a.ProductID
FROM EAV_Table AS a
JOIN EAV_Table AS b
ON a.Attribute = b.Attribute AND a.value = b.value
WHERE a.ProductID != 1
AND b.ProductID = 1
GROUP BY a.ProductID
一般情况下,选择列表中的GROUP BY子句或DISTINCT是必需的(尽管样本数据并未正式要求)。
我们将利用COUNT(column)
仅计算非空值的事实,并使用LEFT OUTER JOIN。
SELECT c.ProductID
FROM EAV_Table AS c
LEFT JOIN EAV_Table AS d
ON c.Attribute = d.Attribute
AND c.Value != d.Value
AND c.ProductID != 1
AND d.ProductID = 1
GROUP BY c.ProductID
HAVING COUNT(d.Value) == 0;
请注意,WHERE子句已合并到ON子句中;这实际上非常重要。
我们将上面的两个查询构建为连接的子查询,以生成最终结果:
SELECT f.ProductID
FROM (SELECT a.ProductID
FROM EAV_Table AS a
JOIN EAV_Table AS b
ON a.Attribute = b.Attribute AND a.value = b.value
WHERE a.ProductID != 1
AND b.ProductID = 1
GROUP BY a.ProductID
) AS e
JOIN (SELECT c.ProductID
FROM EAV_Table AS c
LEFT JOIN EAV_Table AS d
ON c.Attribute = d.Attribute
AND c.Value != d.Value
AND c.ProductID != 1
AND d.ProductID = 1
GROUP BY c.ProductID
HAVING COUNT(D.Value) = 0
) AS f
ON e.ProductID = f.ProductID
这会在样本数据上产生答案2和4。
请注意,本练习的一部分是学习不满意您开发的第一个答案。请注意,最好在全尺寸数据集上对解决方案进行基准测试,而不是在表中只有7行的测试数据集。
答案 1 :(得分:1)
如果我理解你的问题,我认为这应该可以胜任。 Here is the fiddle;
DECLARE @pId INT = 1
SELECT A.pid
FROM (
SELECT pid, count(*) total
FROM t
WHERE pid <> @pId
GROUP BY pid
) A JOIN
(
SELECT pid, count(*) matches
FROM t
WHERE pid<>@pId and att + ':' + convert(varchar(12), val) in (
SELECT att + ':' + convert(varchar(12), val) FROM t
WHERE pid=@pId)
GROUP BY pid
) B ON A.pid = B.pid
WHERE total = matches
注意:根据带有附加数据的评论进行编辑
答案 2 :(得分:0)
为了完整性,使用CTE。 (注意:这将找到所有双胞胎,而不仅仅是productId = 1)
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE eav
( zentity INTEGER NOT NULL
, zattribute varchar NOT NULL
, zvalue INTEGER
, PRIMARY KEY (zentity,zattribute)
);
INSERT INTO eav(zentity, zattribute, zvalue) VALUES
(1, 'Prop1',1) ,(1, 'Prop2',2)
,(2, 'Prop1',1)
,(3, 'Prop1',1) ,(3, 'Prop2',3)
,(4, 'Prop1',1) ,(4, 'Prop3',3) -- added by Jonathan L.
;
-- CTE: pair of entities that have an
-- {attribute,value} in common
WITH pair AS (
SELECT a.zentity AS one
, b.zentity AS two
, a. zattribute AS att
FROM eav a
JOIN eav b ON a.zentity <> b.zentity -- tie-breaker
AND a.zattribute = b.zattribute
AND a.zvalue = b.zvalue
)
SELECT pp.one, pp.two, pp.att
FROM pair pp
-- The Other entity (two) may not have extra attributes
-- NOTE: this NOT EXISTS could be repeated for pair.one, to also
-- suppress the one.* products that have an extra attribute
WHERE NOT EXISTS (
SELECT * FROM eav nx
WHERE nx.zentity = pp.two
AND nx.zattribute <> pp.att
)
ORDER BY pp.one, pp.two, pp.att
;
BTW:真正的潜在问题是“关系分裂”。也许更新的SQL标准应该为它引入一个运算符?