Question

我想构建一个分布式（跨大洲），容错和快速的图像和文件存储。存储器前面会有REST end-point，用于提供图像和/或文件。

图像或文件从中央位置存储/插入，但是从本地Intranet安装的网络服务器提供，该网络服务器对用户进行身份验证和授权。

一个对象可以有多个相同图像的大小，也可能是与之相关的文件。使用提到的存储使我能够选择column family和/或column qualifier来获取所请求的实体。

然而，我确实考虑了FileSystem来检索请求的实体，我需要知道来自数据库的正确路径，或者应该智能地设计路径。这也意味着在新的一年开始时创建文件夹。

一个实体可以拥有不同年份的不同尺寸（缩略图，网格，预览等）。

获取图像的请求看起来像 -

entityId  123
year      2017 
size      thumbnail

获取一年中给定实体的所有可用图像的请求看起来像 -

entityId  123
year      2017

只要上述内容可以实现，我就可以使用任何其他存储解决方案。感谢您的帮助和建议。

Answer 1

你可以像你建议的那样建立一个像

这样的文件系统表

cqlsh> use keyspace1;
cqlsh:keyspace1> create table filesystem(
             ...   entitiyId int,
             ...   year int,
             ...   size text,
             ...   payload blob,
             ...   primary key (entitiyId, year, size));
cqlsh:keyspace1> insert into filesystem (entitiyId, year, size, payload) values (1,2017,'small',textAsBlob('payload'));
cqlsh:keyspace1> insert into filesystem (entitiyId, year, size, payload) values (1,2017,'big',textAsBlob('payload'));
cqlsh:keyspace1> insert into filesystem (entitiyId, year, size, payload) values (1,2016,'small',textAsBlob('payload'));
cqlsh:keyspace1> insert into filesystem (entitiyId, year, size, payload) values (1,2016,'big',textAsBlob('payload'));
cqlsh:keyspace1> insert into filesystem (entitiyId, year, size, payload) values (2,2016,'small',textAsBlob('payload'));
cqlsh:keyspace1>
cqlsh:keyspace1>
cqlsh:keyspace1> select * from filesystem where entitiyId=1 and year=2016;

 entitiyid | year | size  | payload
-----------+------+-------+------------------
         1 | 2016 |   big | 0x7061796c6f6164
         1 | 2016 | small | 0x7061796c6f6164

(2 rows)
cqlsh:keyspace1>

和

cqlsh:keyspace1> select * from filesystem where entitiyId=1 and year=2016 and size='small';

 entitiyid | year | size  | payload
-----------+------+-------+------------------
         1 | 2016 | small | 0x7061796c6f6164

(1 rows)
cqlsh:keyspace1>

您不能使用此方法选择特定尺寸和ID的图像，而不指定年份。

对于相关文件，您可以使用foreign entitiyIds或单独的分组表构建一个列表，以将它们保持在一起。

但是cassandra blob类型理论上限制为2GB但是如果你需要性能，那么实际限制大约是1MB，在极少数情况下几MB（在更大的blob中，性能会在很多方面降低）。如果没有问题，请继续尝试。

另一个想法是使用AWS S3之类的东西来存储实际数据，启用跨区域复制和cassandra元数据。但是如果有人去AWS - 他们也有EFS进行跨区域复制。

MongoDB也可以通过跨区域复制（https://docs.mongodb.com/manual/tutorial/deploy-geographically-distributed-replica-set/）轻松部署。在MongoDB中，您可以将所有数据保存在一个文档中，只需查询其中的相关部分即可。在我看来，MongoDB需要比cassandra更多的内务管理（需要更多的配置和计划）。

图像和文件存储 - HBase，MongoDB或Cassandra

1 个答案: