Question

我目前正在评估Datomic的存储和查询形成本体的解析符号的用例。数据库中总共有225122个符号（实体）（因此它是一个相当大的本体，但对于数据库来说不应该是一个大问题。）

结构很标准，符号有

包含它们的父符号（如子符号等）
supersymbols （他们继承的符号）

为了能够很好地访问这些符号，我们为每个符号都有一个唯一的private void alarmMethod(){ Intent myIntent = new Intent(this , NotifyService.class); AlarmManager alarmManager = (AlarmManager)getSystemService(ALARM_SERVICE); pendingIntent = PendingIntent.getService(this, 0, myIntent, 0); Calendar calendar = Calendar.getInstance(); calendar.set(Calendar.SECOND, 0); calendar.set(Calendar.MINUTE, 0); calendar.set(Calendar.HOUR, 3); calendar.set(Calendar.AM_PM, Calendar.PM); calendar.add(Calendar.DAY_OF_MONTH, 1); calendar.set(Calendar.DAY_OF_WEEK, 6); alarmManager.setRepeating(AlarmManager.RTC_WAKEUP, calendar.getTimeInMillis(), 1000 * 60 * 60 * 24, pendingIntent);。这相当于以下Datomic架构：

name

现在我有了最基本的递归查询＆＃34;给我符号[{:db/ident :ml/name, :db/valueType :db.type/string, :db/cardinality :db.cardinality/one, :db/unique :db.unique/identity} {:db/ident :ml/parent, :db/valueType :db.type/ref, :db/index true, :db/cardinality :db.cardinality/one} {:db/ident :ml/superclass, :db/valueType :db.type/ref, :db/index true, :db/cardinality :db.cardinality/one}]＆＃34;中包含的所有符号（传递）。在Datomic术语中：

查询本身（所以中等大小的符号）介于 5 和 5.5 秒之间，并返回80次点击。 不是毫秒，而是实际秒数。这只是我想要询问的关于数据集的最基本的查询（它旨在从网络工具中使用，以帮助建模人员理解本体的结构）。

我正在运行(def rules '[ [(ubersymbol ?c ?p) (?c :ml/parent ?p)] [(ubersymbol ?c ?p) (?c :ml/parent ?c1) (ubersymbol ?c1 ?p) ] ]) (q '[:find ?c ?n :in $ % :where (ubersymbol ?c ?d) [?d :ml/name "name of a root symbol"] [?c :ml/name ?n]] current-db rules)，使用内存数据库并使用对等库（我按"getting started"指南中所述启动了服务器。

非常感谢帮助为Datomic提供案例。

马库斯

Answer 1

修改

作为自己发现的fricke，它是子句排序的问题，但在查询中，而不是在规则集中。更有效的版本是：

[:find ?c ?n :in $ % :where
   [?d :ml/name "name of a root symbol"]
   (ubersymbol ?c ?d) 
   [?c :ml/name ?n]]

上述查询可以通过以下方式进一步改进：

使用查询参数而不是在查询正文中使用动态参数
使用查找引用按:ml/name

产生：

(d/q
  '[:find ?c ?n :in % $ ?d :where
    (ubersymbol ?c ?d)
    [?c :ml/name ?n]]
  rules current-db [:ml/name "name of a root symbol"])

我的理论是，您的规则不是以Datalog可以针对此读取模式进行优化的方式编写的 - 可能导致遍历所有实体。我建议按如下方式重写它们：

[[(ubersymbol ?c ?p) 
  (?c :ml/parent ?p)]
 [(ubersymbol ?c ?p) 
  ;; we bind a child of the ancestor, instead of a parent of the descendant
  (?c1 :ml/parent ?p)
  (ubersymbol ?c ?c1)]]

这种编写规则集的方法经过优化，可以找到某个节点的后代。您最初编写它的方式经过优化，可以找到某个节点的祖先。

在我的机器上使用Datomic 0.9.5385在50000个实体的平衡二叉树上进行快速基准测试表明，您可以通过第二种方法获得所需的性能。

Datomic的递归Datalog查询确实很慢

1 个答案:

修改