Cassandra Table需要大量的存储空间

时间:2015-06-17 14:48:33

标签: cassandra database-replication cassandra-2.1

我只是通过Spark-Cassandra-Connector将一个3 GB的CSV存储到Cassandra集群中的一个表中(5个节点,每个30 GB,每个实例7.5 GB RAM,cassandra使用~1.8 GB)。

我通过DataOpsCenter看到,我的群集拥有16 GB的数据(每个节点大约3.x GB),我的存储使用量从14 GB(之前)增加到64 GB(写入过程之后)!

我的密钥库有以下设置:

replica_placement_strategy  org.apache.cassandra.locator.SimpleStrategy
replication_factor  2

CREATE TABLE debs.energydata10m (
  id int PRIMARY KEY,
  house_id int,
  household_id int,
  plug_id int,
  ts timestamp,
  type int,
  val float
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='{"keys":"ALL", "rows_per_partition":"NONE"}' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.000000 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

为什么Cassandra需要这个3.1 GB CSV的存储空间?

编辑:这是 ls -lR / var / lib / cassandra / data / debs / 命令的输出:

ubuntu@ip-xx-xx-xx-xx:~$ ls -lR /var/lib/cassandra/data/debs/
/var/lib/cassandra/data/debs/:
total 24
drwxr-xr-x 2 cassandra cassandra     6 Jun 16 12:43 energydata1000m-52502e00142511e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra 16384 Jun 17 13:39 energydata100m-4cb23100142511e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra     6 Jun 17 08:41 energydata10m-46487f90142511e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra  4096 Jun 17 10:58 energydata10m-f17f204014d811e5b5ddabd6d8b6d1d3
drwxr-xr-x 3 cassandra cassandra    22 Jun 17 10:07 energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3
drwxr-xr-x 2 cassandra cassandra     6 Jun 16 12:40 energydata-d615ace0141d11e5b5ddabd6d8b6d1d3

/var/lib/cassandra/data/debs/energydata1000m-52502e00142511e5b5ddabd6d8b6d1d3:
total 0

/var/lib/cassandra/data/debs/energydata100m-4cb23100142511e5b5ddabd6d8b6d1d3:
total 3294336
-rw-r--r-- 1 cassandra cassandra     361779 Jun 17 12:36 debs-energydata100m-ka-187-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra  943405306 Jun 17 12:36 debs-energydata100m-ka-187-Data.db
-rw-r--r-- 1 cassandra cassandra         10 Jun 17 12:36 debs-energydata100m-ka-187-Digest.sha1
-rw-r--r-- 1 cassandra cassandra   17615016 Jun 17 12:36 debs-energydata100m-ka-187-Filter.db
-rw-r--r-- 1 cassandra cassandra  254001924 Jun 17 12:36 debs-energydata100m-ka-187-Index.db
-rw-r--r-- 1 cassandra cassandra       9911 Jun 17 12:36 debs-energydata100m-ka-187-Statistics.db
-rw-r--r-- 1 cassandra cassandra    1763968 Jun 17 12:36 debs-energydata100m-ka-187-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 12:36 debs-energydata100m-ka-187-TOC.txt
-rw-r--r-- 1 cassandra cassandra      46747 Jun 17 12:25 debs-energydata100m-ka-211-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra  120719760 Jun 17 12:25 debs-energydata100m-ka-211-Data.db
-rw-r--r-- 1 cassandra cassandra         10 Jun 17 12:25 debs-energydata100m-ka-211-Digest.sha1
-rw-r--r-- 1 cassandra cassandra    2266552 Jun 17 12:25 debs-energydata100m-ka-211-Filter.db
-rw-r--r-- 1 cassandra cassandra   32799168 Jun 17 12:25 debs-energydata100m-ka-211-Index.db
-rw-r--r-- 1 cassandra cassandra       9955 Jun 17 12:25 debs-energydata100m-ka-211-Statistics.db
-rw-r--r-- 1 cassandra cassandra     227840 Jun 17 12:25 debs-energydata100m-ka-211-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 12:25 debs-energydata100m-ka-211-TOC.txt
-rw-r--r-- 1 cassandra cassandra     400275 Jun 17 13:39 debs-energydata100m-ka-353-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 1053658168 Jun 17 13:39 debs-energydata100m-ka-353-Data.db
-rw-r--r-- 1 cassandra cassandra          9 Jun 17 13:39 debs-energydata100m-ka-353-Digest.sha1
-rw-r--r-- 1 cassandra cassandra   19254504 Jun 17 13:39 debs-energydata100m-ka-353-Filter.db
-rw-r--r-- 1 cassandra cassandra  281034756 Jun 17 13:39 debs-energydata100m-ka-353-Index.db
-rw-r--r-- 1 cassandra cassandra       9911 Jun 17 13:39 debs-energydata100m-ka-353-Statistics.db
-rw-r--r-- 1 cassandra cassandra    1951696 Jun 17 13:39 debs-energydata100m-ka-353-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 13:39 debs-energydata100m-ka-353-TOC.txt
-rw-r--r-- 1 cassandra cassandra     106147 Jun 17 13:32 debs-energydata100m-ka-377-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra  275239666 Jun 17 13:32 debs-energydata100m-ka-377-Data.db
-rw-r--r-- 1 cassandra cassandra         10 Jun 17 13:32 debs-energydata100m-ka-377-Digest.sha1
-rw-r--r-- 1 cassandra cassandra    5209632 Jun 17 13:32 debs-energydata100m-ka-377-Filter.db
-rw-r--r-- 1 cassandra cassandra   74503386 Jun 17 13:32 debs-energydata100m-ka-377-Index.db
-rw-r--r-- 1 cassandra cassandra       9935 Jun 17 13:32 debs-energydata100m-ka-377-Statistics.db
-rw-r--r-- 1 cassandra cassandra     517456 Jun 17 13:32 debs-energydata100m-ka-377-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 13:32 debs-energydata100m-ka-377-TOC.txt
-rw-r--r-- 1 cassandra cassandra      63267 Jun 17 13:36 debs-energydata100m-ka-392-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra  163610575 Jun 17 13:36 debs-energydata100m-ka-392-Data.db
-rw-r--r-- 1 cassandra cassandra         10 Jun 17 13:36 debs-energydata100m-ka-392-Digest.sha1
-rw-r--r-- 1 cassandra cassandra    3146928 Jun 17 13:36 debs-energydata100m-ka-392-Filter.db
-rw-r--r-- 1 cassandra cassandra   44398512 Jun 17 13:36 debs-energydata100m-ka-392-Index.db
-rw-r--r-- 1 cassandra cassandra       9971 Jun 17 13:36 debs-energydata100m-ka-392-Statistics.db
-rw-r--r-- 1 cassandra cassandra     308400 Jun 17 13:36 debs-energydata100m-ka-392-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 13:36 debs-energydata100m-ka-392-TOC.txt
-rw-r--r-- 1 cassandra cassandra      16475 Jun 17 13:37 debs-energydata100m-ka-398-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra   42447012 Jun 17 13:37 debs-energydata100m-ka-398-Data.db
-rw-r--r-- 1 cassandra cassandra         10 Jun 17 13:37 debs-energydata100m-ka-398-Digest.sha1
-rw-r--r-- 1 cassandra cassandra     819112 Jun 17 13:37 debs-energydata100m-ka-398-Filter.db
-rw-r--r-- 1 cassandra cassandra   11540160 Jun 17 13:37 debs-energydata100m-ka-398-Index.db
-rw-r--r-- 1 cassandra cassandra       9915 Jun 17 13:37 debs-energydata100m-ka-398-Statistics.db
-rw-r--r-- 1 cassandra cassandra      80208 Jun 17 13:37 debs-energydata100m-ka-398-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 13:37 debs-energydata100m-ka-398-TOC.txt
-rw-r--r-- 1 cassandra cassandra       3307 Jun 17 13:37 debs-energydata100m-ka-399-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra    8375321 Jun 17 13:37 debs-energydata100m-ka-399-Data.db
-rw-r--r-- 1 cassandra cassandra         10 Jun 17 13:37 debs-energydata100m-ka-399-Digest.sha1
-rw-r--r-- 1 cassandra cassandra     159248 Jun 17 13:37 debs-energydata100m-ka-399-Filter.db
-rw-r--r-- 1 cassandra cassandra    2292966 Jun 17 13:37 debs-energydata100m-ka-399-Index.db
-rw-r--r-- 1 cassandra cassandra       9895 Jun 17 13:37 debs-energydata100m-ka-399-Statistics.db
-rw-r--r-- 1 cassandra cassandra      16000 Jun 17 13:37 debs-energydata100m-ka-399-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 13:37 debs-energydata100m-ka-399-TOC.txt
-rw-r--r-- 1 cassandra cassandra       3299 Jun 17 13:39 debs-energydata100m-ka-400-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra    8332947 Jun 17 13:39 debs-energydata100m-ka-400-Data.db
-rw-r--r-- 1 cassandra cassandra         10 Jun 17 13:39 debs-energydata100m-ka-400-Digest.sha1
-rw-r--r-- 1 cassandra cassandra     159088 Jun 17 13:39 debs-energydata100m-ka-400-Filter.db
-rw-r--r-- 1 cassandra cassandra    2290716 Jun 17 13:39 debs-energydata100m-ka-400-Index.db
-rw-r--r-- 1 cassandra cassandra       9895 Jun 17 13:39 debs-energydata100m-ka-400-Statistics.db
-rw-r--r-- 1 cassandra cassandra      15984 Jun 17 13:39 debs-energydata100m-ka-400-Summary.db
-rw-r--r-- 1 cassandra cassandra         91 Jun 17 13:39 debs-energydata100m-ka-400-TOC.txt

/var/lib/cassandra/data/debs/energydata10m-46487f90142511e5b5ddabd6d8b6d1d3:
total 0

/var/lib/cassandra/data/debs/energydata10m-f17f204014d811e5b5ddabd6d8b6d1d3:
total 326684
-rw-r--r-- 1 cassandra cassandra     95051 Jun 17 10:30 debs-energydata10m-ka-37-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 245687780 Jun 17 10:30 debs-energydata10m-ka-37-Data.db
-rw-r--r-- 1 cassandra cassandra        10 Jun 17 10:30 debs-energydata10m-ka-37-Digest.sha1
-rw-r--r-- 1 cassandra cassandra   4617168 Jun 17 10:30 debs-energydata10m-ka-37-Filter.db
-rw-r--r-- 1 cassandra cassandra  66716856 Jun 17 10:30 debs-energydata10m-ka-37-Index.db
-rw-r--r-- 1 cassandra cassandra      9923 Jun 17 10:30 debs-energydata10m-ka-37-Statistics.db
-rw-r--r-- 1 cassandra cassandra    463376 Jun 17 10:30 debs-energydata10m-ka-37-Summary.db
-rw-r--r-- 1 cassandra cassandra        91 Jun 17 10:30 debs-energydata10m-ka-37-TOC.txt
-rw-r--r-- 1 cassandra cassandra      3379 Jun 17 10:28 debs-energydata10m-ka-38-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra   8505046 Jun 17 10:28 debs-energydata10m-ka-38-Data.db
-rw-r--r-- 1 cassandra cassandra         9 Jun 17 10:28 debs-energydata10m-ka-38-Digest.sha1
-rw-r--r-- 1 cassandra cassandra    162984 Jun 17 10:28 debs-energydata10m-ka-38-Filter.db
-rw-r--r-- 1 cassandra cassandra   2346732 Jun 17 10:28 debs-energydata10m-ka-38-Index.db
-rw-r--r-- 1 cassandra cassandra      9895 Jun 17 10:28 debs-energydata10m-ka-38-Statistics.db
-rw-r--r-- 1 cassandra cassandra     16368 Jun 17 10:28 debs-energydata10m-ka-38-Summary.db
-rw-r--r-- 1 cassandra cassandra        91 Jun 17 10:28 debs-energydata10m-ka-38-TOC.txt
-rw-r--r-- 1 cassandra cassandra      1811 Jun 17 10:58 debs-energydata10m-ka-39-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra   4475513 Jun 17 10:58 debs-energydata10m-ka-39-Data.db
-rw-r--r-- 1 cassandra cassandra        10 Jun 17 10:58 debs-energydata10m-ka-39-Digest.sha1
-rw-r--r-- 1 cassandra cassandra     86392 Jun 17 10:58 debs-energydata10m-ka-39-Filter.db
-rw-r--r-- 1 cassandra cassandra   1243818 Jun 17 10:58 debs-energydata10m-ka-39-Index.db
-rw-r--r-- 1 cassandra cassandra      9895 Jun 17 10:58 debs-energydata10m-ka-39-Statistics.db
-rw-r--r-- 1 cassandra cassandra      8704 Jun 17 10:58 debs-energydata10m-ka-39-Summary.db
-rw-r--r-- 1 cassandra cassandra        91 Jun 17 10:58 debs-energydata10m-ka-39-TOC.txt

/var/lib/cassandra/data/debs/energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3:
total 0
drwxr-xr-x 3 cassandra cassandra 40 Jun 17 10:07 snapshots

/var/lib/cassandra/data/debs/energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3/snapshots:
total 4
drwxr-xr-x 2 cassandra cassandra 4096 Jun 17 10:07 1434535647574-energydata10m

/var/lib/cassandra/data/debs/energydata10m-fa83059014cd11e5b5ddabd6d8b6d1d3/snapshots/1434535647574-energydata10m:
total 326784
-rw-r--r-- 1 cassandra cassandra     92923 Jun 17 09:15 debs-energydata10m-ka-37-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra 240323836 Jun 17 09:15 debs-energydata10m-ka-37-Data.db
-rw-r--r-- 1 cassandra cassandra        10 Jun 17 09:15 debs-energydata10m-ka-37-Digest.sha1
-rw-r--r-- 1 cassandra cassandra   4520064 Jun 17 09:15 debs-energydata10m-ka-37-Filter.db
-rw-r--r-- 1 cassandra cassandra  65218608 Jun 17 09:15 debs-energydata10m-ka-37-Index.db
-rw-r--r-- 1 cassandra cassandra      9919 Jun 17 09:15 debs-energydata10m-ka-37-Statistics.db
-rw-r--r-- 1 cassandra cassandra    452976 Jun 17 09:15 debs-energydata10m-ka-37-Summary.db
-rw-r--r-- 1 cassandra cassandra        91 Jun 17 09:15 debs-energydata10m-ka-37-TOC.txt
-rw-r--r-- 1 cassandra cassandra      3307 Jun 17 09:14 debs-energydata10m-ka-38-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra   8321541 Jun 17 09:14 debs-energydata10m-ka-38-Data.db
-rw-r--r-- 1 cassandra cassandra        10 Jun 17 09:14 debs-energydata10m-ka-38-Digest.sha1
-rw-r--r-- 1 cassandra cassandra    159384 Jun 17 09:14 debs-energydata10m-ka-38-Filter.db
-rw-r--r-- 1 cassandra cassandra   2294964 Jun 17 09:14 debs-energydata10m-ka-38-Index.db
-rw-r--r-- 1 cassandra cassandra      9895 Jun 17 09:14 debs-energydata10m-ka-38-Statistics.db
-rw-r--r-- 1 cassandra cassandra     16016 Jun 17 09:14 debs-energydata10m-ka-38-Summary.db
-rw-r--r-- 1 cassandra cassandra        91 Jun 17 09:14 debs-energydata10m-ka-38-TOC.txt
-rw-r--r-- 1 cassandra cassandra      3307 Jun 17 09:15 debs-energydata10m-ka-39-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra   8316992 Jun 17 09:15 debs-energydata10m-ka-39-Data.db
-rw-r--r-- 1 cassandra cassandra        10 Jun 17 09:15 debs-energydata10m-ka-39-Digest.sha1
-rw-r--r-- 1 cassandra cassandra    159296 Jun 17 09:15 debs-energydata10m-ka-39-Filter.db
-rw-r--r-- 1 cassandra cassandra   2293614 Jun 17 09:15 debs-energydata10m-ka-39-Index.db
-rw-r--r-- 1 cassandra cassandra      9895 Jun 17 09:15 debs-energydata10m-ka-39-Statistics.db
-rw-r--r-- 1 cassandra cassandra     16000 Jun 17 09:15 debs-energydata10m-ka-39-Summary.db
-rw-r--r-- 1 cassandra cassandra        91 Jun 17 09:15 debs-energydata10m-ka-39-TOC.txt
-rw-r--r-- 1 cassandra cassandra       755 Jun 17 10:07 debs-energydata10m-ka-40-CompressionInfo.db
-rw-r--r-- 1 cassandra cassandra   1781300 Jun 17 10:07 debs-energydata10m-ka-40-Data.db
-rw-r--r-- 1 cassandra cassandra        10 Jun 17 10:07 debs-energydata10m-ka-40-Digest.sha1
-rw-r--r-- 1 cassandra cassandra     34752 Jun 17 10:07 debs-energydata10m-ka-40-Filter.db
-rw-r--r-- 1 cassandra cassandra    500220 Jun 17 10:07 debs-energydata10m-ka-40-Index.db
-rw-r--r-- 1 cassandra cassandra      9895 Jun 17 10:07 debs-energydata10m-ka-40-Statistics.db
-rw-r--r-- 1 cassandra cassandra      3552 Jun 17 10:07 debs-energydata10m-ka-40-Summary.db
-rw-r--r-- 1 cassandra cassandra        91 Jun 17 10:07 debs-energydata10m-ka-40-TOC.txt
-rw-r--r-- 1 cassandra cassandra       152 Jun 17 10:07 manifest.json

/var/lib/cassandra/data/debs/energydata-d615ace0141d11e5b5ddabd6d8b6d1d3:
total 0

信息:energydata10m或energydata1000m的数据在energydata100m(启动前14 GB磁盘空间)的写入过程之前就已经存在了!

**************编辑**************
我在这里找到了计算公式:http://docs.datastax.com/en/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html他们说磁盘上的数据可能远远高于原始数据集。有人可以解释如何计算上面链接的值吗?我不知道所需的数据大小......

1 个答案:

答案 0 :(得分:0)

以下文档说明了数据大小及其计算: shown by Charles Duffy