Question

我是Cassandra的新手 - 在这个月内，来自一个很长的SQL Server背景。我的任务是删除一些Python来自动化sstables的批量加载。输入sstableloader。到目前为止我安装的所有东西都用于测试。我在一个单节点集群上安装了一个安装了Cassandra的虚拟机。这需要一些设置和一个环回ipaddress。所以我有127.0.0.1和127.0.0.2，种子设置为127.0.0.1。我成功地启动并运行了Cassandra，并且可以通过Python中的简单连接字符串从其他框访问它 - 因此我的大多数要求都得到了满足。我遇到问题的地方是通过除cql以外的任何东西加载数据。我可以使用insert语句来获取整天的数据 - 我需要成功完成的是成功运行json2sstable和sstableloader（此时单独）。踢球者报告说一切都很好......我的数据在任何一种情况下都不会出现。以下是我重新创建问题的方法。

Keyspace，列族和文件夹：sampledb_adl，emp_new_9 / var / lib / cassandra / data / emp_new_9

Table created at cqlsh prompt: CREATE TABLE emp_new_9 (pkreq uuid, empid int, deptid int, first_name text, last_name text, PRIMARY KEY     ((pkreq)))   WITH
  bloom_filter_fp_chance=0.010000 AND 
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

通过cqlsh输入到表中的初始数据：INSERT INTO emp_new_9（pkreq，empid，deptid，first_name，last_name）VALUES（uuid（），30001,235，'yogi'，'bear'）;

'select * from emp_new_9'的结果：
pkreq | deptid | empid | first_name |姓 -------------------------------------- + -------- + - ----- + ------------ + ----------- 9c6dd9de-f6b1-4312-9737-e9d00b8187f3 | 235 | 30001 |瑜伽士|熊

启动nodetool flush

此时emp_new_9文件夹的内容：

sampledb_adl-emp_new_9-jb-1-CompressionInfo.db  sampledb_adl-emp_new_9-jb-1-Index.db       sampledb_adl-emp_new_9-jb-1-TOC.txt
sampledb_adl-emp_new_9-jb-1-Data.db             sampledb_adl-emp_new_9-jb-1-Statistics.db
sampledb_adl-emp_new_9-jb-1-Filter.db           sampledb_adl-emp_new_9-jb-1-Summary.db

当前结果：[root @ localhost \ temp_new_9] #sstable2json /var/lib/cassandra/data/sampledb_adl/emp_new_9/sampledb_adl-emp_new_9-jb-1-Data.db

[
{"key": "9c6dd9def6b143129737e9d00b8187f3","columns": [["","",1443108919841000], ["deptid","235",1443108919841000],     ["empid","30001",1443108919841000], ["first_name","yogi",1443108919841000], ["last_name","bear",1443108919841000]]}
]

现在用不同的数据创建emp_new_10：

Keyspace，列族和文件夹：sampledb_adl，emp_new_10 / var / lib / cassandra / data / emp_new_10

Table created at cqlsh prompt: CREATE TABLE emp_new_10 (pkreq uuid, empid int, deptid int, first_name text, last_name text, PRIMARY KEY     ((pkreq)))  WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

通过cqlsh输入到表中的初始数据：INSERT INTO emp_new_10（pkreq，empid，deptid，first_name，last_name）VALUES（uuid（），30101,298，'scoobie'，'doo'）;

'select * from emp_new_10'的结果：

启动nodetool flush

此时emp_new_10文件夹的内容：

sampledb_adl-emp_new_10-jb-1-CompressionInfo.db  sampledb_adl-emp_new_10-jb-1-Index.db       sampledb_adl-emp_new_10-jb-1-TOC.txt
sampledb_adl-emp_new_10-jb-1-Data.db             sampledb_adl-emp_new_10-jb-1-Statistics.db
sampledb_adl-emp_new_10-jb-1-Filter.db           sampledb_adl-emp_new_10-jb-1-Summary.db

目前的结果：[root @ localhost emp_new_10] #sstable2json /var/lib/cassandra/data/sampledb_adl/emp_new_10/sampledb_adl-emp_new_10-jb-1-Data.db

[
{"key": "c0e1763d8b2b45939dafaf3596ed08be","columns": [["","",1443109509458000], ["deptid","298",1443109509458000],     ["empid","30101",1443109509458000], ["first_name","scoobie",1443109509458000], ["last_name","doo",1443109509458000]]}
]

所以，瑜伽士9，scoobie 10.

现在我要先尝试使用json2sstable和我命名的emp_new_10文件（原文，我知道）：emp_new_10.json

json2sstable -K sampledb_adl -c emp_new_9 /home/tdmcoe_admin/Desktop/emp_new_10.json /var/lib/cassandra/data/sampledb_adl/emp_new_10/sampledb_adl-emp_new_10-jb-1-Data.db

打印到终端窗口的结果：

ERROR 08:56:48,581 Unable to initialize MemoryMeter (jamm not specified as javaagent).  This means Cassandra will be unable to measure object sizes accurately and may consequently OOM.
Importing 1 keys...
1 keys imported successfully.

我一直得到MemoryMeter错误而忽视谷歌搜索表示它不会影响结果。

所以，我的文件夹内容没有改变，'select * from emp_new_9;'仍然给出相同的单个原始记录结果。 emp_new_10也没有改变。我的'1键成功导入'到底发生了什么？成功在哪里？

现在为相关的sstableloader。相同的基本文件夹/数据，但现在正在运行sstableloader：

[root@localhost emp_new_10]# sstableloader -d 127.0.0.1 /var/lib/cassandra/data/sampledb_adl/emp_new_9

注意：我也使用127.0.0.2和127.0.0.1,127.0.0.2以及以下相同的结果。

打印到终端窗口的结果：

ERROR 09:05:07,686 Unable to initialize MemoryMeter (jamm not specified as javaagent).  This means Cassandra will be unable to measure object sizes accurately and may consequently OOM.
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /var/lib/cassandra/data/sampledb_adl/emp_new_9/sampledb_adl-emp_new_9-jb-1-Data.db to [/<my machine ip>]
Streaming session ID: 06a9c1a0-62d6-11e5-b85d-597b365ae56f
progress: [/<my machine ip> 1/1 (100%)] [total: 100% - 0MB/s (avg: 0MB/s)]

所以 - 100％ - 耶！ 0MB / s嘘！

现在对于emp_new_9文件夹的内容，我现在还没有触及第二组文件：

sampledb_adl-emp_new_9-jb-1-CompressionInfo.db  sampledb_adl-emp_new_9-jb-1-TOC.txt             sampledb_adl-emp_new_9-jb-2-Statistics.db
sampledb_adl-emp_new_9-jb-1-Data.db             sampledb_adl-emp_new_9-jb-2-CompressionInfo.db  sampledb_adl-emp_new_9-jb-2-Summary.db
sampledb_adl-emp_new_9-jb-1-Filter.db           sampledb_adl-emp_new_9-jb-2-Data.db             sampledb_adl-emp_new_9-jb-2-TOC.txt
sampledb_adl-emp_new_9-jb-1-Index.db            sampledb_adl-emp_new_9-jb-2-Filter.db
sampledb_adl-emp_new_9-jb-1-Statistics.db       sampledb_adl-emp_new_9-jb-2-Index.db

'select * from emp_new_9;'的结果没有改变，在数据文件的两个上使用sstable2json也只显示1个老瑜伽入口。当我运行nodetool compact时，它会返回到只有1个yogi行的1组文件。 100％发生了什么？！？ 100％的是什么？

感谢任何帮助。我很困惑。

Answer 1

使用json2sstable时，应指定新的不存在的.db文件的名称。按照设计，SSTable是不可变的，因此不允许通过json2sstable更新它们。

无论出于何种原因，该工具都不会抱怨现有的SSTable。如果指定新的.db文件，您会发现将使用您期望的内容创建SSTable文件。

Answer 2

我想到了这一点 - 我正在使用一个带有uuid字段的表，并尝试在表中添加已经在该字段中已经有uuid的批量加载，因此它失败了。使用文本列进行测试，一切正常！

Cassandra json2sstable和sstableloader报告了积极的结果，但没有发生数据变化

2 个答案: