Cassandra - .db文件比实际数据大

时间:2018-03-14 10:45:03

标签: cassandra diskspace

我们当前正在切换cassandra(2.x到3.11.1),当我以纯文本(如准备好的INSERT语句)导出数据并检查文件大小时,我感到震惊。

txt中的实际数据大小为11.7GB。 所有.db文件的实际文件大小为127GB。

所有键空间都配置了压缩SizeTieredCompactionStrategy和compresseion LZ4:

AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

那么为什么磁盘上的文件比实际数据大10倍?如何缩小这些文件以反映(有点)实际数据大小?

请注意:所有数据都是简单的时间序列,包含时间戳和值(最小值,最大值,平均值,计数,字符串......)

架构:

CREATE TABLE prod.data (
datainput bigint,
aggregation int,
timestamp bigint,
avg double,
count double,
flags int,
max double,
min double,
sum double,
val_d double,
val_l bigint,
val_str text,
PRIMARY KEY (datainput, aggregation, timestamp)
) WITH CLUSTERING ORDER BY (aggregation ASC, timestamp ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 1.0
    AND speculative_retry = '99PERCENTILE';

全部谢谢!

更新

  • 添加架构
  • 修复cassandra版本(3.1 => 3.11.1)

0 个答案:

没有答案