我正在使用当前的SNOMED数据和示例,我想创建一个传递闭包表,但我的mysql5.6默认服务器设置中的某些内容失败。
对于那些不知道的人,SNOMED是一个医学数据库。 有2.1M关系和446697个概念。查询在第二部分停滞 - 所以我猜它已经没用了RAM。但是我会调整哪些设置? join_buffer_size?
这是代码:
DELIMITER ;;
CREATE DEFINER=`snomed`@`localhost` PROCEDURE `createTc`()
BEGIN
drop table if exists tc;
CREATE TABLE tc (
source BIGINT UNSIGNED NOT NULL ,
dest BIGINT UNSIGNED NOT NULL
) ENGINE = InnoDB CHARSET=utf8;
insert into tc (source, dest)
select distinct rel.sourceid, rel.destinationid
from rf2_ss_relationships rel
inner join rf2_ss_concepts con
on rel.sourceid = con.id and con.active = 1
where rel.typeid = 116680003 # IS A relationship
and rel.active = 1;
REPEAT
insert into tc (source, dest)
select distinct b.source, a.dest
from tc a
join tc b on a.source = b.dest
left join tc c on c.source = b.source and c.dest = a.dest
where c.source is null;
set @x = row_count();
select concat('Inserted ', @x);
UNTIL @x = 0 END REPEAT;
create index idx_tc_source on tc (source);
create index idx_tc_dest on tc (dest);
END;;
DELIMITER ;
CREATE TABLE `rf2_ss_relationships` (
`id` bigint(20) unsigned NOT NULL,
`effectiveTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`active` tinyint(4) DEFAULT '1',
`moduleId` bigint(20) unsigned NOT NULL,
`sourceId` bigint(20) unsigned NOT NULL,
`destinationId` bigint(20) unsigned NOT NULL,
`relationshipGroup` bigint(20) unsigned NOT NULL,
`typeId` bigint(20) unsigned NOT NULL,
`characteristicTypeId` bigint(20) unsigned NOT NULL,
`modifierId` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`,`effectiveTime`),
KEY `moduleId_idx` (`moduleId`),
KEY `sourceId_idx` (`sourceId`),
KEY `destinationId_idx` (`destinationId`),
KEY `relationshipGroup_idx` (`relationshipGroup`),
KEY `typeId_idx` (`typeId`),
KEY `characteristicTypeId_idx` (`characteristicTypeId`),
KEY `modifierId_idx` (`modifierId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `rf2_ss_concepts` (
`id` bigint(20) unsigned NOT NULL,
`effectiveTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`active` tinyint(4) DEFAULT NULL,
`moduleId` bigint(20) unsigned NOT NULL,
`definitionStatusId` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`,`effectiveTime`),
KEY `moduleId_idx` (`moduleId`),
KEY `definitionStatusId_idx` (`definitionStatusId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
答案 0 :(得分:2)
我不知道这是否是最佳答案,但确实有效...... 我更改了create table语法以在创建时添加索引 - 而不是在完成之后。我更改了innodb_buffer_pool_size = 8G
的mysqld设置 CREATE TABLE tc (
source BIGINT UNSIGNED NOT NULL ,
dest BIGINT UNSIGNED NOT NULL,
KEY source_idx (source),
KEy dest_idx (dest)
) ENGINE = InnoDB CHARSET=utf8;
即使我的带有SSD的i7 mac上的执行速度也不快但它确实有效,并且传递闭包表是5180059行.......
mysql> call createTc;
+-------------------------+
| concat('Inserted ', @x) |
+-------------------------+
| Inserted 654161 |
+-------------------------+
1 row in set (1 min 55.13 sec)
+-------------------------+
| concat('Inserted ', @x) |
+-------------------------+
| Inserted 1752024 |
+-------------------------+
1 row in set (3 min 5.60 sec)
+-------------------------+
| concat('Inserted ', @x) |
+-------------------------+
| Inserted 2063816 |
+-------------------------+
1 row in set (10 min 42.07 sec)
+-------------------------+
| concat('Inserted ', @x) |
+-------------------------+
| Inserted 275904 |
+-------------------------+
1 row in set (28 min 5.49 sec)
+-------------------------+
| concat('Inserted ', @x) |
+-------------------------+
| Inserted 280 |
+-------------------------+
1 row in set (46 min 29.78 sec)
+-------------------------+
| concat('Inserted ', @x) |
+-------------------------+
| Inserted 0 |
+-------------------------+
1 row in set (1 hour 5 min 20.05 sec)
Query OK, 0 rows affected (1 hour 5 min 20.05 sec)
答案 1 :(得分:0)
我使用这种递归方法。在阅读关系时,我已经将所有直接后代添加到了概念对象的列表中(我使用了hibernate),因此它们可用。
然后我沿着概念列表开始这个递归函数。 查看示例。它通过所有直接父母的名单来寻找每个概念:
for (Sct2Concept c : concepts.values()) {
for(Sct2Relationship parentRelation : c.getChildOfRelationships()){
addParentToList(concepts, sct2TransitiveClosureList, parentRelation, c);
}
}
如您所见,TransitiveClosure内存存储是一个Set,因此可以检查智能和非常成熟的Java库内部代码的唯一值。
private void addParentToList(Map<String, Sct2Concept> concepts, Set<Sct2TransitiveClosure> sct2TransitiveClosureList, Sct2Relationship parentRelation, Sct2Concept c){
if(!parentRelation.isActive())
return;
Sct2TransitiveClosure tc = new Sct2TransitiveClosure(parentRelation.getDestinationSct2Concept().getId(), c.getId());
if(sct2TransitiveClosureList.add(tc)){
Sct2Concept s = concepts.get(Long.toString(tc.getParentId()));
for(Sct2Relationship newParentRelation : s.getChildOfRelationships()){
addParentToList(concepts, sct2TransitiveClosureList, newParentRelation, c);
}
}
}