我有5个项目的9个样本交易:
[Table 1]
itemset | TID_set
--------+---------------------------------------
a | 100, 400, 500, 700, 800, 900
b | 100, 200, 300, 400, 600, 800, 900
c | 300, 500, 600, 700, 800, 900
d | 200, 400
e | 100, 800
[Table 2]
itemset | TID_set
--------+----------------------
a, b | 100, 400, 800, 900
a, c | 500, 700, 800, 900
a, d | 400
a, e | 100, 800
b, c | 300, 600, 800, 900
b, d | 200, 400
b, e | 100, 800
c, e | 800
[Table 3]
itemset | TID_set
--------+-----------
a, b, c | 800, 900
a, b, e | 100, 800
我想使用深度优先搜索算法在Table 3
中显示数据,但结果与table 3
不同。这是我的源代码:
string query = "INSERT INTO table" + (k) + " SELECT DISTINCT ";
for (int i = 1; i <= k - 1; i++)
{
query = query + "P.itemset" + i + ", ";
}
query = query + "Q.itemset" + (k - 1) + ",(SELECT COUNT(DISTINCT table1.TID_set) FROM table1 WHERE table1.TID_set = ANY(SELECT table1.TID_set FROM table1 WHERE table1.itemset IN( ";
for (int i = 1; i <= k - 1; i++)
{
query = query + "P.itemset" + i + ",";
}
query = query + "Q.itemset" + (k - 1) + ") GROUP BY table1.TID_set HAVING COUNT(DISTINCT table1.itemset)>=" + k + "))";
query = query + "FROM table" + (k - 1) + " P , table" + (k - 1) + " Q WHERE Q.itemset" + (k - 1) + " > P.itemset" + (k - 1) + " ";
for (int i = 2; i < k - 1; i++)
{
query = query + "AND P.itemset" + i + " > P.itemset" + (i - 1) + " ";
}
query = query + "ORDER BY ";
for (int i = 1; i <= k - 1; i++)
{
query = query + "P.itemset" + i + ",";
}
query = query + "Q.itemset" + (k - 1) + "";
答案 0 :(得分:1)
有一个原因可以解释为什么着名的 APRIORI 算法不会为每个项目集组合查询一次数据库,但只有每个项目集长度扫描一次:这已经是很贵。
如果您尝试将所有内容都塞入一个大型SQL查询中,则无济于事。
由于尺寸原因,您的方法无法扩展到任何有意义的数据集。
如果您将数据库简单地视为数据存储,从中读取事务,并在C#程序中执行实际算法而不是滥用SQL而不是为其设计的内容,那将会容易得多......