Question

我开始为我们在系统上管理的文档构建记录保存数据库。每个文档都会经历一系列特定的处理任务，我将在此处称之为规范化，转换和提取。

文档处理可能会在任何这些步骤中失败，因此，我正在寻找一种解决方案，我可以快速存储此信息以进行存档，但我也应该能够查询信息（并可能对其进行汇总）。如果我在json中定义我的数据结构，它可能看起来像这样：

{ 10123 : [
    { queue : 'converter',
      startedAt : 'date-here',
      finishedAt: 'date-here',
      error : { message : 'error message', stackTrace : 'stack trace here' },
      machine : '192.168.0.1'
    } , 
    { queue : 'extractor',
      startedAt : 'date-here',
      finishedAt: 'date-here',
      error : { message : 'error message', stackTrace : 'stack trace here' },
      machine : '192.168.0.1'
    }, 
    { queue : 'extractor',
      startedAt : 'date-here',
      finishedAt: 'date-here',
      error : { message : 'error message', stackTrace : 'stack trace here' },
      machine : '192.168.0.1'
    }, 
] }

在一个理想的世界中，我将获得特定文档的完整处理生命周期信息，并且还应该能够检测出哪些失败以及每个过程的平均时间。

有关处理此问题的理想数据库解决方案的任何提示？这可能会导致每天数千次写入。

主要解决方案是用Java编写的，因此DB应该有一个Java驱动程序。

Answer 1

Mongodb是正确的选择，因为它支持您开箱即用的所有预期功能

文件/嵌入文件
json compatible
支持查询（当然除了加入）
超快
10gen支持的java驱动程序

查看mongodb use cases了解详情

用于跟踪文档管理系统内文档状态的数据库

1 个答案: