Cassandra COPY FROM失去了很多行

时间:2017-04-03 05:57:55

标签: csv import cassandra

我正在尝试弄清楚当我使用COPY FROM从CSV文件加载时丢失数据的原因。这是我的设置:

% cat /tmp/data 
2-54-2014,"2014-01-01T01:00:00Z","1588.6960767"
2-54-2014,"2014-01-01T01:10:00Z","1587.64072333"
2-54-2014,"2014-01-01T01:20:00Z","1590.48448448"
2-54-2014,"2014-01-01T01:30:00Z","1590.72830295"
2-54-2014,"2014-01-01T01:40:00Z","1582.58896162"
2-54-2014,"2014-01-01T01:50:00Z","1569.62739561"
2-54-2014,"2014-01-01T02:00:00Z","1560.63714579"
2-54-2014,"2014-01-01T02:10:00Z","1551.97991093"
2-54-2014,"2014-01-01T02:20:00Z","1576.29093944"
2-54-2014,"2014-01-01T02:30:00Z","1584.34574486"

% cqlsh -k hats                                                           
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.10 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh:hats> CREATE TABLE power_1turb (
            id TEXT, ts TIMESTAMP, value DOUBLE, PRIMARY KEY ((id), ts));

现在我尝试将数据文件加载到Cassandra中:

cqlsh:hats> COPY power_1turb (id, ts, value) FROM '/tmp/data';
Using 7 child processes

Starting copy of hats.power_1turb with columns [id, ts, value].
Processed: 10 rows; Rate:      16 rows/s; Avg. rate:      24 rows/s
10 rows imported from 1 files in 0.412 seconds (0 skipped).


cqlsh:hats> select * from power_1turb ;

 id        | ts                              | value
-----------+---------------------------------+------------
 2-54-2014 | 2013-12-31 18:00:00.000000+0000 | 1560.63715

(1 rows)

为什么它只加载1行,为什么它总是在数据中间的同一行?如果我运行一些像insert into power_1turb (id, ts, value) values ('2-54-2014','2014-01-01T01:30:00Z',1590.72830295);这样的查询,他们会很好地填充数据库。

1 个答案:

答案 0 :(得分:4)

定义datetimeformat以及复制命令

因为您的日期时间格式与cqlsh默认日期时间格式不匹配

对于您的情况,请使用以下复制命令:

COPY power_1turb (id, ts, value) FROM 'data' WITH DATETIMEFORMAT = '%Y-%m-%dT%H:%M:%SZ';

使用Cassandra 2.2.5

的Cqlsh进行测试
cassandra@cqlsh:test> SELECT * FROM power_1turb ;

 id        | ts                       | value
-----------+--------------------------+------------
 2-54-2014 | 2014-01-01 01:00:00+0000 | 1588.69608
 2-54-2014 | 2014-01-01 01:10:00+0000 | 1587.64072
 2-54-2014 | 2014-01-01 01:20:00+0000 | 1590.48448
 2-54-2014 | 2014-01-01 01:30:00+0000 |  1590.7283
 2-54-2014 | 2014-01-01 01:40:00+0000 | 1582.58896
 2-54-2014 | 2014-01-01 01:50:00+0000 |  1569.6274
 2-54-2014 | 2014-01-01 02:00:00+0000 | 1560.63715
 2-54-2014 | 2014-01-01 02:10:00+0000 | 1551.97991
 2-54-2014 | 2014-01-01 02:20:00+0000 | 1576.29094
 2-54-2014 | 2014-01-01 02:30:00+0000 | 1584.34574

(10 rows)

相关文档适用于cassandra 2.2.5 cqlsh

  

DATETIMEFORMAT,曾经被称为TIMEFORMAT,一个包含日期和时间值的Python strftime格式的字符串,例如'%Y-%m-%d%H:%M:%S%z'。它默认为cqlshrc中的time_format值。