TL; DR

Question

假设我的实体entry具有引用属性:entry/groups。我应该如何构建查询以查找:entry/groups属性包含我输入外来ID的所有的实体？

下一个伪代码将更好地说明我的问题：

[2 3] ; having this as input foreign ids

;; and having these entry entities in db
[{:entry/id "A" :entry/groups  [2 3 4]}  
 {:entry/id "B" :entry/groups  [2]}     
 {:entry/id "C" :entry/groups  [2 3]}  
 {:entry/id "D" :entry/groups  [1 2 3]}
 {:entry/id "E" :entry/groups  [2 4]}] 

;; only A, C, D should be pulled

作为Datomic / Datalog的新成员，我已经用尽所有选项，所以任何帮助都会受到赞赏。谢谢！

Answer 1

您可以在Tupelo-Datomic库中看到此in the James Bond example的示例。您只需指定2个子句，一个用于集合中的每个所需值：

; Search for people that match both {:weapon/type :weapon/guile} and {:weapon/type :weapon/gun}
(let [tuple-set   (td/find :let    [$ (live-db)]
                           :find   [?name]
                           :where  {:person/name ?name :weapon/type :weapon/guile }
                                   {:person/name ?name :weapon/type :weapon/gun } ) ]
  (is (= #{["Dr No"] ["M"]} tuple-set )))

在纯粹的Datomic中，它看起来很相似，但使用类似于实体ID的内容：

[?eid :entry/groups 2]
[?eid :entry/groups 3]

和Datomic将执行隐式AND操作（即两个子句必须匹配;忽略任何多余的条目）。这在逻辑上是一个＆＃34;加入＆＃34;操作，即使它是同一个实体被查询两个值。您可以找到更多信息in the Datomic docs。

Answer 2

TL; DR

你正在解决“动态连接”的一般问题＆＃39;在Datomic的Datalog中。

这里有3个策略：

编写一个动态数据记录查询，该查询使用2个否定和1个析取或递归规则（见下文）
生成查询代码（相当于Alan Thompson的回答）：缺点是动态生成Datalog子句的常见缺点，即您不会从query plan caching中受益。
直接使用indexes（EAVT或AVET）。

动态数据记录查询

Datalog没有直接表达动态连接的方式（逻辑AND /＆＃39;适用于所有...＆＃39; / set intersection）。但是，您可以通过组合一个析取（逻辑OR /＆＃39;存在...＆＃39; / set union）和两个否定来实现纯数据目录，即(For all ?g in ?Gs p(?e,?g)) <=> NOT(Exists ?g in ?Gs, such that NOT(p(?e, ?g)))

在您的情况下，这可以表示为：

[:find [?entry ...] :in $ ?groups :where
  ;; these 2 clauses are for restricting the set of considered datoms, which is more efficient (and necessary in Datomic's Datalog, which will refuse to scan the whole db)
  ;; NOTE: this imposes ?groups cannot be empty!
  [(first ?groups) ?group0]
  [?entry :entry/groups ?group0]
  ;; here comes the double negation
  (not-join [?entry ?groups]
    [(identity ?groups) [?group ...]]
    (not-join [?entry ?group]
      [?entry :entry/groups ?group]))]

好消息：这可以表示为非常通用的数据记录规则（我最终可能会添加到Datofu）：

[(matches-all ?e ?a ?vs)
 [(first ?vs) ?v0]
 [?e ?a ?v0]
 (not-join [?e ?a ?vs]
   [(seq ?vs) [?v ...]]
   (not-join [?e ?a ?v]
     [?e ?a ?v]))]

...这意味着您的查询现在可以表示为：

[:find [?entry ...] :in % $ ?groups :where
 (matches-all ?entry :entry/groups ?groups)]

注意：使用 递归规则 的替代实现：

[[(matches-all ?e ?a ?vs)
  [(seq ?vs)]
  [(first ?vs) ?v]
  [?e ?a ?v]
  [(rest ?vs) ?vs2]
  (matches-all ?e ?a ?vs2)]
 [(matches-all ?e ?a ?vs)
  [(empty? ?vs)]]]

这个优点是接受一个空的?vs集合（只要?e和?a在查询中以其他方式绑定）。

生成查询代码

生成查询代码的优点在于它在这种情况下相对简单，并且它可能使查询执行比更动态的替代更有效。在Datomic中生成Datalog查询的缺点是您可能会失去查询计划缓存的好处;因此，即使您要生成查询，您仍然希望尽可能使它们成为通用的（即仅取决于v值的数量）

(defn q-find-having-all-vs 
  [n-vs]
  (let [v-syms (for [i (range n-vs)]
                 (symbol (str "?v" i)))]
    {:find '[[?e ...]]
     :in (into '[$ ?a] v-syms)
     :where 
     (for [?v v-syms]
       ['?e '?a ?v])}))

;; examples    
(q-find-having-all-vs 1)
=> {:find [[?e ...]], 
    :in [$ ?a ?v0],
    :where 
    ([?e ?a ?v0])}
(q-find-having-all-vs 2)
=> {:find [[?e ...]],
    :in [$ ?a ?v0 ?v1], 
    :where
    ([?e ?a ?v0] 
     [?e ?a ?v1])}
(q-find-having-all-vs 3)
=> {:find [[?e ...]], 
    :in [$ ?a ?v0 ?v1 ?v2], 
    :where 
    ([?e ?a ?v0] 
     [?e ?a ?v1]
     [?e ?a ?v2])}


;; executing the query: note that we're passing the attribute and values!
(apply d/q (q-find-having-all-vs (count groups))
  db :entry/group groups)

直接使用索引

我完全不确定上述方法在Datomic Datalog的当前实现中的效率如何。如果您的基准测试显示速度很慢，您可以始终回退到直接索引访问。

这是Clojure中使用AVET索引的一个例子：

(defn find-having-all-vs
  "Given a database value `db`, an attribute identifier `a` and a non-empty seq of entity identifiers `vs`,
  returns a set of entity identifiers for entities which have all the values in `vs` via `a`"
  [db a vs]
  ;; DISCLAIMER: a LOT can be done to improve the efficiency of this code! 
  (apply clojure.set/intersection 
    (for [v vs]
      (into #{} 
        (map :e)
        (d/datoms db :avet a v)))))

查找其ref-to-many属性包含所有输入元素的实体

2 个答案:

TL; DR

动态数据记录查询

生成查询代码

直接使用索引