我有一个包含EAV结构化数据的大型数据库,必须是可搜索和可分页的。我尝试了书中的每一个技巧,以使其足够快,但在某些情况下,它仍然无法在合理的时间内完成。
这是我的表结构(仅限相关部分,如果您需要更多,请询问):
CREATE TABLE IF NOT EXISTS `object` (
`object_id` bigint(20) NOT NULL AUTO_INCREMENT,
`oid` varchar(32) CHARACTER SET utf8 NOT NULL,
`status` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
`created` datetime NOT NULL,
`updated` datetime NOT NULL,
PRIMARY KEY (`object_id`),
UNIQUE KEY `oid` (`oid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `version` (
`version_id` bigint(20) NOT NULL AUTO_INCREMENT,
`type_id` bigint(20) NOT NULL,
`object_id` bigint(20) NOT NULL,
`created` datetime NOT NULL,
`status` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`version_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `value` (
`value_id` bigint(20) NOT NULL AUTO_INCREMENT,
`object_id` int(11) NOT NULL,
`attribute_id` int(11) NOT NULL,
`version_id` bigint(20) NOT NULL,
`type_id` bigint(20) NOT NULL,
`value` text NOT NULL,
PRIMARY KEY (`value_id`),
KEY `field_id` (`attribute_id`),
KEY `action_id` (`version_id`),
KEY `form_id` (`type_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
这是一个示例对象。我的数据库中有 100万。每个对象可能具有不同数量的属性,具有不同的attribute_id
INSERT INTO `owner` (`owner_id`, `uid`, `status`, `created`, `updated`) VALUES (1, 'cwnzrdxs4dzxns47xs4tx', 'Green', NOW(), NOW());
INSERT INTO `object` (`object_id`, `type_id`, `owner_id`, `created`, `status`) VALUES (1, 1, 1, NOW(), NOW());
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (1, 1, 1, 1, 1, 'Munich');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (2, 1, 2, 1, 1, 'Germany');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (3, 1, 3, 1, 1, '123');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (4, 1, 4, 1, 1, '2012-01-13');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (5, 1, 5, 1, 1, 'A cake!');
现在谈谈我目前的机制。我的第一次尝试是Mysql的典型方法。在我需要的任何东西上做一个带有大量连接的巨大SQL。彻底的灾难!由于RAM耗尽,导致加载甚至崩溃PHP和MySQL服务器。
所以我将查询分成几个步骤:
1确定所有必需的attribute_ids。
我可以在另一个引用对象的type_id的表中查找它们。结果是attribute_ids列表。 (此表与性能无关,因此未包含在我的样本中。)
:type_id包含我想要包含在搜索中的任何对象的所有type_id。我已经在我的申请中获得了这些信息。所以这很便宜。
SELECT * FROM attribute WHERE form_id IN (:type_id)
Result是一个type_id整数数组。
2搜索匹配的对象 编译一个大的SQL查询,为我想要的每个条件添加一个INNER JOIN。这听起来很可怕,但最后,这是最快的方法:(
典型生成的查询可能如下所示。 LIMIT遗憾地是必要的,或者我可能会得到这么多ID,结果数组使PHP爆炸或在下一个查询中破坏IN语句:
SELECT DISTINCT `version`.object_id FROM `version`
INNER JOIN `version` AS condition1
ON `version`.version_id = condition1.version_id
AND condition1.created = '2012-03-04' -- Filter by version date
INNER JOIN `value` AS condition2
ON `version`.version_id = condition2.version_id
AND condition2.type_id IN (:type_id) -- try to limit joins to object types we need
AND condition2.attribute_id = :field_id2 -- searching for a value in a specific attribute
AND condition2.value = 'Munich' -- searching for the value 'Munich'
INNER JOIN `value` AS condition3
ON `version`.version_id = condition3.version_id
AND condition3.type_id IN (:type_id) -- try to limit joins to object types we need
AND condition3.attribute_id = :field_id3 -- searching for a value in a specific attribute
AND condition3.value = 'Green' -- searching for the value 'Green'
WHERE `version`.type_id IN (:type_id) ORDER BY `version`.version_id DESC LIMIT 10000
结果将包含我可能需要的任何对象的所有object_ids。我正在选择object_ids而不是version_ids,因为我需要拥有匹配对象的所有版本,无论哪个版本匹配。
3排序和分页结果 接下来,我将创建一个查询,按特定属性对对象进行排序,然后对结果数组进行分页。
SELECT DISTINCT object_id
FROM value
WHERE object_id IN (:foundObjects)
AND attribute_id = :attribute_id_to_sort
AND value > ''
ORDER BY value ASC LIMIT :limit OFFSET :offset
结果是来自前搜索
的已排序和分页的对象ID列表4获取我们的完整对象,版本和属性 在最后一步中,我将为先前查询找到的任何对象和版本选择所有值。
SELECT `value`.*, `object`.*, `version`.*, `type`.*
`object`.status AS `object.status`,
`object`.flag AS `object.flag`,
`version`.created AS `version.created`,
`version`.status AS `version.status`,
FROM version
INNER JOIN `type` ON `version`.form_id = `type`.type_id
INNER JOIN `object` ON `version`.object_id = `object`.object_id
LEFT JOIN value ON `version`.version_id = `value`.version_id
WHERE version.object_id IN (:sortedObjectIds) AND `version.type_id IN (:typeIds)
ORDER BY version.created DESC
然后,结果将通过PHP编译成漂亮的object-> version->值数组结构。
现在问题:
如果一切都失败了,可能会切换数据库技术?请参阅我的其他问题:Database optimized for searching in large number of objects with different attributes
真实生活样本
表格大小:对象 - 193801行,版本 - 193841行,值 - 1053928行
SELECT * FROM attribute WHERE attribute_id IN (30)
SELECT DISTINCT `version`.object_id
FROM version
INNER JOIN value AS condition_d4e328e33813
ON version.version_id = condition_d4e328e33813.version_id
AND condition_d4e328e33813.type_id IN (30)
AND condition_d4e328e33813.attribute_id IN (377)
AND condition_d4e328e33813.value LIKE '%e%'
INNER JOIN value AS condition_2c870b0a429f
ON version.version_id = condition_2c870b0a429f.version_id
AND condition_2c870b0a429f.type_id IN (30)
AND condition_2c870b0a429f.attribute_id IN (376)
AND condition_2c870b0a429f.value LIKE '%s%'
WHERE version.type_id IN (30)
ORDER BY version.version_id DESC LIMIT 10000 -- limit to 10000 or it breaks!
说明:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE condition_2c870b0a429f ref field_id,action_id,form_id field_id 4 const 178639 Using where; Using temporary; Using filesort
1 SIMPLE action eq_ref PRIMARY PRIMARY 8 condition_2c870b0a429f.action_id 1 Using where
1 SIMPLE condition_d4e328e33813 ref field_id,action_id,form_id action_id 8 action.action_id 11 Using where; Distinct
对象搜索已完成(峰值RAM:5.91MB,时间:4.64秒)
SELECT DISTINCT object_id
FROM version
WHERE object_id IN (193793,193789, ... ,135326,135324) -- 10000 ids in here!
ORDER BY created ASC
LIMIT 50 OFFSET 0
对象排序完成(峰值RAM:6.68MB,时间:0.352s)
SELECT `value`.*, object.*, version.*, type.*,
object.status AS `object.status`,
object.flag AS `object.flag`,
version.created AS `version.created`,
version.status AS `version.status`,
version.flag AS `version.flag`
FROM version
INNER JOIN type ON version.type_id = type.type_id
INNER JOIN object ON version.object_id = object.object_id
LEFT JOIN value ON version.version_id = `value`.version_id
WHERE version.object_id IN (135324,135326,...,135658,135661) AND version.type_id IN (30)
ORDER BY quality DESC, version.created DESC
对象加载查询完成(峰值RAM:6.68MB,时间:0.083s)
对象编译成阵列完成(峰值RAM:6.68MB,时间:0.007s)
答案 0 :(得分:0)
尝试在搜索查询之前添加一个EXPLAIN:
EXPLAIN SELECT DISTINCT `version`.object_id FROM `version`, etc ...
然后检查“额外”列中的结果,它会为您提供一些加速查询的线索,例如在右侧字段中添加INDEX。
有时你可以删除INNER JOIN,在你的Mysql响应中获得更多结果,并通过PHP循环处理来过滤大数组。
答案 1 :(得分:0)
我首先尝试覆盖索引(即:所有列都匹配您要查询的条件,甚至拉出结果)。这样引擎就不必返回原始页面数据。
由于您需要版本中的“object_id”,并使用“version_id”作为其他表的连接基础。你的版本表在TYPE_ID上也有一个WHERE子句,所以我会在
上有一个索引版本表 - (object_id,version_id,type_id)
对于您的“价值”表格,也符合条件
值表 - (version_id,type_id,attribute_id,value,created)