如何在两台服务器之间可靠地复制Cassandra数据库?

时间:2017-02-13 18:21:01

标签: python-2.7 cassandra cqlsh

我有一个测试设置,我想要一份主数据。

我正在使用datastax的3.0版本的Cassandra软件包

我正在使用CQLSH来获取数据转储,并在测试设置中恢复。 我正在使用

获取主数据的副本
  

复制到DELIMITER ='\ t'和NULL ='null'AND QUOTE ='“'AND HEADER = True

我正在使用

填充数据
  

COPY FROM DELIMITER ='\ t'AND NULL ='null'AND QUOTE ='“'AND HEADER = True

在COPY_FROM之后,CQLSH表示它已成功复制了文件中的所有行。但是当我在表上运行count(*)时,缺少几行。 丢失的行没有特定的模式。如果在截断表后重播命令,则会丢失一组新行。缺失行的计数是随机的。

表结构包含用户定义数据类型的列表/集合,UDT内容中可能包含“null”值。

除了以编程方式读取和写入两个数据库之间的各行之外,还有其他可靠的方法来复制数据吗?

表格的模式(字段名称已更改):

INSERT INTO `geodb`.`geoareas` (`geoarea`, `zip`, `state`)
(SELECT CONCAT(`uszipcode`.`name`, ' ', `uszipcode`.`state`) as 'geoarea', `uszipcode`.`zip`, `uszipcode`.`state`
FROM `geodb`.`uszipcode`
INNER JOIN
(SELECT `name`, `state`, SUM(`population`) AS 'Population'
FROM `geodb`.`uszipcode`
WHERE `uszipcode`.`state` = State
GROUP BY `name`, `state`
HAVING (SUM(`population`) >= CityMin AND SUM(`population`) <= CityMax)) as `cities`
ON `uszipcode`.`name` = `cities`.`name`
AND `uszipcode`.`state` = `cities`.`state`
ORDER BY `uszipcode`.`name`, `uszipcode`.`zip`);

2 个答案:

答案 0 :(得分:2)

除了导出/导入数据外,您还可以尝试复制数据。

  1. 使用“nodetool snapshot”https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsSnapShot.html拍摄原始群集中的数据快照。
  2. 在测试群集上创建架构
  3. 将快照从原始群集加载到测试群集:

    一个。如果测试中的所有节点都包含所有数据(单节点/ 3节点rf = 3) - 或者数据量很小 - 将文件从原始集群复制到keyspace / column_family目录并执行nodetool refresh({{ 3}}) - 确保不重叠文件

    湾如果测试集群节点没有保存所有数据/数据量很大 - 使用sstablloader(https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsRefresh.html)将文件从快照流式传输到测试集群

答案 1 :(得分:0)

我使用通用COPY TOCOPY FROM模式测试了您的架构而没有分隔符,它运行正常。我已经测试了几次,但没有什么遗漏。

cassandra@cqlsh:cypher> INSERT INTO table1 (id, data, list1, set1 ) VALUES ( 1, 'cypher', ['a',1,'b'], {true}) ;
cassandra@cqlsh:cypher> SELECT * FROM table1 ; 

 id | data   | list1                                                                                                                             | set1
----+--------+-----------------------------------------------------------------------------------------------------------------------------------+--------------------------------
  1 | cypher | [{field1: 'a', field2: null, field3: null}, {field1: '1', field2: null, field3: null}, {field1: 'b', field2: null, field3: null}] | {{field1: True, field2: null}}

cassandra@cqlsh:cypher> INSERT INTO table1 (id, data, list1, set1 ) VALUES ( 2, '2_cypher', ['amp','avd','ball'], {true, false}) ;
cassandra@cqlsh:cypher> SELECT * FROM table1 ;

 id | data     | list1                                                                                                                                    | set1
----+----------+------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------
  1 |   cypher |        [{field1: 'a', field2: null, field3: null}, {field1: '1', field2: null, field3: null}, {field1: 'b', field2: null, field3: null}] |                                {{field1: True, field2: null}}
  2 | 2_cypher | [{field1: 'amp', field2: null, field3: null}, {field1: 'avd', field2: null, field3: null}, {field1: 'ball', field2: null, field3: null}] | {{field1: False, field2: null}, {field1: True, field2: null}}

cassandra@cqlsh:cypher> COPY table1 TO 'table1.csv';
Using 1 child processes

Starting copy of cypher.table1 with columns [id, data, list1, set1].
Processed: 2 rows; Rate:       0 rows/s; Avg. rate:       0 rows/s
2 rows exported to 1 files in 4.358 seconds.
cassandra@cqlsh:cypher> TRUNCATE table table1 ;
cassandra@cqlsh:cypher> SELECT * FROM table1;

 id | data | list1 | set1
----+------+-------+------

cassandra@cqlsh:cypher> COPY table1 FROM 'table1.csv';
Using 1 child processes

Starting copy of cypher.table1 with columns [id, data, list1, set1].
Processed: 2 rows; Rate:       2 rows/s; Avg. rate:       3 rows/s
2 rows imported from 1 files in 0.705 seconds (0 skipped).
cassandra@cqlsh:cypher> SELECT * FROM table1  ;

 id | data     | list1                                                                                                                                    | set1
----+----------+------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------
  1 |   cypher |        [{field1: 'a', field2: null, field3: null}, {field1: '1', field2: null, field3: null}, {field1: 'b', field2: null, field3: null}] |                                {{field1: True, field2: null}}
  2 | 2_cypher | [{field1: 'amp', field2: null, field3: null}, {field1: 'avd', field2: null, field3: null}, {field1: 'ball', field2: null, field3: null}] | {{field1: False, field2: null}, {field1: True, field2: null}}

(2 rows)
cassandra@cqlsh:cypher>