Question

我已经找到了在relational和document数据库中设置标记系统的建议，但对于图形/多模型数据库没有任何建议。

我正在尝试在ArangoDB中为文档设置一个标记系统（让他们在＃34;文章＆＃34;中调用它们）。我可以想到两种显而易见的方法来将标签存储在多模型（图形+文档）数据库中，如Arango：

作为每个文章文档（文档数据库样式）中的数组
作为单独的文档类，每个标记作为唯一文档，边缘将标记文档连接到文章文档（更接近关系数据库风格）

这些实际上是两种主要方法吗？似乎都不理想。例如：

如果我在每篇文章文档中存储标签，我可以为标签编制索引，并且可能ArangoDB正在优化他们使用的空间。但是，我无法使用图形功能来链接或遍历标记（或者我必须单独执行）。
如果我将标签存储为单独的标签文档，当我只想获取文档上的标签列表时，这似乎是额外的开销（额外的查询）。

这引出了一个明确的问题：关于后一个选项，是否有任何简单的方法可以自动连接标记＆＃39;文件出现在文章文件中？例如。有一个数组属性，以某种方式＆＃39;镜像＆＃39;连接标记文档的tag.name属性？

也欢迎一般建议。

Answer 1

您已经提到了大多数可用的决策标准。也许我可以添加更多：

文档中的关系标记可以使用数组索引对它们进行过滤，这可以快速对它们进行查询。但是，如果您想为该标记数组的每个项添加评级或解释，则无法进行。如果要计算标记的文档，这可能比计算源自特定标记的所有边缘更昂贵，或者可能找到符合搜索条件的所有标记。

多模型的一个强大功能是，您不需要在两种方法之间做出决定。您可以使用边缘集合将带有属性的标记连接到文档，并在文档内部使用具有相同（平面）标记的索引数组。如果您发现所有（或大多数）查询只使用一种方法，请尝试转换其余方法并删除其他解决方案。如果这不起作用，您的应用程序只需要它们。

在这两种情况下，可以在子查询中找到其他标记文档：

LET docs=(FOR ftDoc IN FULLTEXT(articles, 'text', 'search')
    COLLECT tags = ftDoc.tags INTO tags RETURN {tags, ftDoc})
LET tags = FLATTEN(FOR t IN docs[*].tags RETURN t)
LET otherArticles = (FOR oneTag IN tags 
    FOR oneD IN articles FILTER oneTag IN oneD.tag RETURN oneD._key)
RETURN {articles: docs, tags: tags, otherArticles: otherArticles}

Answer 2

关于您是否有关联文档是否会自动显示在您的文档中的明确问题的答案，遗憾的是没有。我已经制作了一个带有单独标签文档的ArangoDB图表，但我正在认真考虑将其转换为单个项目的属性，因为标签似乎遵循属性的标准，而不是相关项目。

Mike Williamson撰写了一篇关于此事的好文章：https://mikewilliamson.wordpress.com/2015/07/16/data-modeling-with-arangodb/

他认为，单个顶点有很多边缘很慢，而流行的标记顶点的边数就是这种情况。

Answer 3

@Joachim Bøgglid linked to Mike Williamson: https://mikewilliamson.wordpress.com/2015/07/16/data-modeling-with-arangodb/

I would agree with Williamson that "Compact by default" is generally the way to go. You can then extract nodes from properties if/when the actual need emerges. It also avoids creating an overly interconnected graph structure which would be slow for all kinds of traversal queries. However, I think having a Tag vertex is good in this case, because you can then store meta-data on the tag (like count), and connect it to other tags and sub-tags. It seems very useful and foreseeable in the particular case of tags. Having a node which you can add more relationships to if/when you need them is also very extensible, so you keep your future options more open (more easy, at least).

It seems Williamson agrees:

"But not everything belongs together. Any attribute that contains a complex data structure (like the “comments” array or the “tags” array) deserves a little scrutiny as it might make sense as a vertex (or vertices) of its own."

The original question by @ropeladder poses the main objection of extra overhead (an extra query). I think this might be premature optimisation to think too much about performance at this stage. After all; the extra query might be fast, or it might be included in the original query's result set. In any case, I would quote this:

“In general, it’s bad practise to try to conflate nodes to preserve query-time efficiency. If we model in accordance with the questions we want to ask of our data, an accurate representation of the domain will emerge. Graph databases maintain fast query times even when storing vast amounts of data. Learning to trust our graph database is important when learning to structure our graphs without denormalizing them.” - from page 64, chapter 'Avoiding Anti-patterns', in the book 'Graph Databases', a book co-written by Eifrem, the founder of Neo4j, another very popular native graph database. It's free and available online here: https://neo4j.com/graph-databases-book/

See also this article on some anti-patterns (dense vs sparse graphs), to supplement Williamsons points: https://neo4j.com/blog/dark-side-neo4j-worst-practices/

Extra section included for completeness, to those who want to dive a little bit deeper into this question:

Answering Williamson's own criteria for deciding whether a tag should be a Node on its own, instead of leaving it as a property on the document:

Will it be accessed on it’s own? (ie: showing tags without the document)

Yes. Browsing tags available in the system could be useful.

Will you be running a graph measurement (like GRAPH_BETWEENNESS) on it?

Unsure.

Will it be edited on it’s own?

Yes, probably. A user could edit it separately. Maybe an admin/moderator wants to clean up the tag names (correct spelling errors), or clean up their structure (if you have sub-tags).

Does/could the tags have relationships of it’s own? (assuming you care)

Yes. They could. Sub-tags, other kinds of content than merely documents.

Would/should this attribute exist without it’s parent vertex?

Yes. A tag could/should exist even if the last tagged document was deleted. Someone might want to use that tag later on, and it represents domain information you might want to preserve.

将标签存储在图形数据库中

3 个答案: