source | target
apple | dog
dog | cat
door | cat
dog | apple
cat | dog -----step 1.
使用SQL代码:
SELECT GREATEST(source,target),LEAST(source,target),COUNT(*) FROM my_table GROUP BY GREATEST(source,target),LEAST(source,target);
将是
apple dog 2
dog cat 2
door cat 1 ------step2.
所以我想计算概率并更新为名称调用" prob"柱
像
source | target | prob
apple | dog | 2/(2+2+1)
dog | cat | 2/(2+2+1)
door | cat | 1/(2+2+1)
dog | apple| 2/(2+2+1)
cat | dog | 2/(2+2+1) -------step3.
如何从第1步到第3步。
答案 0 :(得分:1)
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(source VARCHAR(12) NOT NULL,target VARCHAR(12) NOT NULL
,PRIMARY KEY(source,target)
);
INSERT INTO my_table VALUES
('apple','dog'),
('dog','cat'),
('door','cat'),
('dog','apple'),
('cat','dog');
SELECT x.*
, y.total/(SELECT COUNT(*) FROM my_table) prob
FROM my_table x
JOIN
( SELECT GREATEST(source,target) g,LEAST(source,target) l,COUNT(*) total FROM my_table GROUP BY g,l ) y
ON (y.g = x.source AND y.l = x.target)
OR (y.g = x.target AND y.l = x.source);
+--------+--------+--------+
| source | target | prob |
+--------+--------+--------+
| apple | dog | 0.4000 |
| dog | apple | 0.4000 |
| cat | dog | 0.4000 |
| dog | cat | 0.4000 |
| door | cat | 0.2000 |
+--------+--------+--------+
......或类似的东西