Question

这是一个相当简单的问题。但是，我无法提出任何合理的解决方案，因此解决方案可能会或可能不那么容易烹饪。这是问题所在：

让很多记录描述一些对象。例如：

{
  id         : 1,
  kind       : cat,
  weight     : 25 lb,
  color      : red
  age        : 10,
  fluffiness : 98 
  attitude   : grumpy
}

{
  id       : 2,
  kind     : robot,
  chassis  : aluminum,
  year     : 2015,
  hardware : intel curie,
  battery  : 5000,
  bat-life : 168,
  weight   : 0.5 lb,
}

{
  id       : 3,
  kind     : lightsaber,
  color    : red,
  type     : single blade,
  power    : 1000,
  weight   : 25 lb,
  creator  : Darth Vader
}

未预先指定属性，因此可以使用任何属性 - 值对来描述对象。如果有1 000 000条记录/对象，则很容易有10万种不同的属性。

我的目标是有效地搜索包含所有记录的数据结构，并在可能的情况下（快速）回答哪些记录符合给定条件。

例如，搜索查询可以是：Find all cats that weigh more than 20 and are older than 9 and are more fluffy than 98 and are red and whose attitude is "grumpy".

我们可以假设可能存在无限数量的记录和无限数量的属性，但任何搜索查询都包含不超过20个数字（lt，gt）子句。

我可以想到使用SQL / MySQL的一种可能的实现是使用全文索引。

例如，我可以将非数字属性存储为＆＃34; kind_cat color_red attitude_grumpy＆＃34;，搜索它们以缩小结果集，然后扫描包含匹配数字属性的表。然而，似乎（我现在不确定）gt，一般使用这种策略的翻译可能是昂贵的（我必须至少为N个数字子句做N个连接）。

我想到了MongoDB对这个问题的思考，但是虽然MongoDB自然允许我存储键值对，但是通过某些字段（并非所有字段）进行搜索意味着我必须创建包含所有可能的顺序/排列中的所有键的索引（这是不可能的。）

这可以使用MySQL或任何其他dbms有效地完成（可能是在对数时间吗？）？ - 如果没有，是否有数据结构（可能是一些多维树？）和算法，允许大规模有效地执行这种搜索（考虑时间和空间复杂性）？

如果不可能以这种方式解决定义的问题，那么有任何启发式方法可以解决它而不会牺牲太多。

Answer 1

如果我认为你的想法是正确的：

create table t 
( id int not null
, kind varchar(...) not null
, key varchar(...) not null
, val varchar(...) not null
, primary key (id, kind, key) );

这种方法存在一些问题，您可以谷歌搜索EAV以了解更多信息。一个例子是你在进行比较时必须将val转换成适当的类型（＆＃39; 2＆＃39;＆gt;＆＃39; 10＆＃39;）

那说，索引如：

create unique index ix1 on t (kind, key, val, id)

将减轻您将遭受的痛苦，但设计不会很好地扩展，并且对于1E6行和1E5属性，性能将远远不够好。您的示例查询看起来像：

select a.id
from ( select id 
       from ( select id, val 
              from t 
              where kind = 'cat'
                and key = 'weight' 
            )
       where cast(val as int) > 20
     ) as a
join ( select id 
       from ( select id, val 
              from t 
              where kind = 'cat'
                and key = 'age' 
            )
       where cast(val as int) > 9
     ) as b
     on a.id = b.id
join ( ...
                and key = 'fluffy' 
            )
       where cast(val as int) > 98
     ) as c
     on a.id = c.id
join ...

有效搜索匹配给定属性/属性及其值的记录（完全匹配，小于，大于）

1 个答案: