我想听听有关实施以下问题的数据库解决方案的一些建议
1) There are 100 million XML documents saved to the database per
day.
2) The database hold maximum 3 days of data
3) 1 million query request per day
4) The value through which the documents are filtered are stored in
a seperate table and mapped with the corresponding XMl document ID.
5) The documents are requested based on date range, documents
matching a list of ID's, Top 10 new documents, records that are new
after the previous request
这是我到目前为止所做的事情
1) Checked if I can use Redis, it is limited to few datatypes and
also cannot use multiple where conditions to filter the Hash in
Redis. Indexing based on date and lots of there fields. I am unable
to choose a right datastructure to store it on a hash
2) Investigated DynamoDB, its again a key vaue store where all the
filter conditions should be stored as one value. I am not sure if it
will be efficient querying a json document to filter the right XML
documnent.
3) Investigated Cassandra and it looks like it may fit my
requirement but it has a limitation saying that the read operations
might be slow. Cassandra has an advantage of faster write operation
over changing data. This looks like the best possible solition used
so far.
目前我们正在使用SQL服务器并且存在性能问题,因此正在寻找更好的解决方案。
请建议,谢谢。
答案 0 :(得分:4)
并不是Cassandra中的读取可能很慢,但是很难保证读取SLA(通常它们会很快,但其中一些会变慢)。
Cassandra没有您将来可能需要的搜索功能(订购,搜索许多字段,排名搜索)。您可以使用Cassandra实现这一目标,但显然需要比适合搜索操作的数据库更多的努力。
我建议您查看Lucene / Elasticsearch。让我从他们的主网站引用Lucene的特征:
<强>可扩展强>
强大,准确,高效的搜索算法