我有下表:
CREATE TABLE `Triples` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`Subject` longtext COLLATE utf8mb4_unicode_ci,
`Predicate` longtext COLLATE utf8mb4_unicode_ci,
`Object` longtext COLLATE utf8mb4_unicode_ci,
`SubHash` binary(16) DEFAULT NULL,
`PredHash` binary(16) DEFAULT NULL,
`ObHash` binary(16) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `PredHash` (`PredHash`),
KEY `ObHash` (`ObHash`),
KEY `SubHash` (`SubHash`)
) ENGINE=InnoDB
它包含大约8亿行。 现在我想创建另外两个表。 一个表是节点:
CREATE TABLE `Nodes` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`val_hash` binary(16) NOT NULL,
`val` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
`subjectCount` bigint(20) unsigned NOT NULL DEFAULT '0',
`objectCount` bigint(20) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
)
这将一次存储三元组中的所有主题和对象(插入数据后val_hash将成为一个键)。 问题是哪个INSERT表现更好?:
INSERT INTO Nodes(val,val_hash)
SELECT j.val,j.val_hash FROM (SELECT Subject as val, SubHash as val_hash
FROM Triples GROUP BY val_hash
UNION
SELECT Object as val, ObHash as val_hash
FROM Triples GROUP BY val_hash) as j
GROUP BY j.val_hash
或以下:
ALTER TABLE Nodes
ADD UNIQUE KEY(val_hash);
INSERT IGNORE INTO Nodes(val,val_hash)
SELECT Subject,SubHash FROM Triples GROUP BY SubHash;
INSERT IGNORE INTO Nodes(val,val_hash)
SELECT Object,ObHash FROM Triples GROUP BY ObHash;
我问,因为密钥增加了插入的复杂性,但是工会需要一个临时表,我不知道在这种情况下哪一个更好。