Question

我正在尝试将CSV文件导入Cassandra表，但我遇到了问题。插入成功后，至少这是Cassandra所说的，我仍然看不到任何记录。这里有一些细节：

qlsh:recommendation_engine> COPY row_historical_game_outcome_data  FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|';

2 rows imported in 0.216 seconds.
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data;

 customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------

(0 rows)
cqlsh:recommendation_engine>

这就是我的数据的样子

'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|123123|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0|
'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|456456|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0|

这是cassandra版本apache-cassandra-2.2.0

已编辑：

CREATE TABLE row_historical_game_outcome_data (
    customer_id int,
    game_id int,
    time timestamp,
    channel text,
    currency_code text,
    game_code text,
    game_name text,
    game_type text,
    game_vendor text,
    progressive_winnings double,
    stake_amount double,
    win_amount double,
    PRIMARY KEY ((customer_id, game_id, time))
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

我也按照 uri2x

的建议尝试了以下内容

但仍然没有：

select * from row_historical_game_outcome_data;

 customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------

(0 rows)
cqlsh:recommendation_engine> COPY row_historical_game_outcome_data ("game_vendor","game_id","game_code","game_name","game_type","channel","customer_id","stake_amount","win_amount","currency_code","time","progressive_winnings")  FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|';

2 rows imported in 0.192 seconds.
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data;

 customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------

(0 rows)

Answer 1

好的，我必须更改有关您的数据文件的一些内容才能使其正常工作：

SomeName|673|SomeName|SomeName|TYPE|M|123123|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0
SomeName|673|SomeName|SomeName|TYPE|M|456456|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0

移除了尾管。
将时间截断到秒。
删除了所有单引号。

一旦我这样做，我就执行了：

aploetz@cqlsh:stackoverflow> COPY row_historical_game_outcome_data 
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount,
 win_amount,currency_code , time , progressive_winnings) 
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|';

Improper COPY command.

这个有点棘手。我终于发现COPY不喜欢列名time。我调整了表格以改为使用名称game_time，然后重新运行COPY：

aploetz@cqlsh:stackoverflow> DROP TABLE row_historical_game_outcome_data ;
aploetz@cqlsh:stackoverflow> CREATE TABLE row_historical_game_outcome_data (
             ...     customer_id int,
             ...     game_id int,
             ...     game_time timestamp,
             ...     channel text,
             ...     currency_code text,
             ...     game_code text,
             ...     game_name text,
             ...     game_type text,
             ...     game_vendor text,
             ...     progressive_winnings double,
             ...     stake_amount double,
             ...     win_amount double,
             ...     PRIMARY KEY ((customer_id, game_id, game_time))
             ... );

aploetz@cqlsh:stackoverflow> COPY row_historical_game_outcome_data
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount,
 win_amount,currency_code , game_time , progressive_winnings)
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|';

3 rows imported in 0.738 seconds.
aploetz@cqlsh:stackoverflow> SELECT * FROM row_historical_game_outcome_data ;

 customer_id | game_id | game_time                | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+--------------------------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------
      123123 |     673 | 2015-07-01 00:01:42-0500 |       M |           GBP |  SomeName |  SomeName |      TYPE |    SomeName |                    0 |          0.2 |          0
      456456 |     673 | 2015-07-01 00:01:42-0500 |       M |           GBP |  SomeName |  SomeName |      TYPE |    SomeName |                    0 |          0.2 |          0

(2 rows)

我不确定为什么会说＆＃34; 3行导入，＆＃34;所以我的猜测是它正在计算标题行。
您的密钥都是分区密钥。不确定你是否真的明白这一点。我只是指出了它，因为我无法想出指定多个分区键的理由，而也指定了一个群集密钥。
我在DataStax文档中找不到任何指示＆＃34; time＆＃34;是一个保留字。它可能是cqlsh中的一个错误。但严重的是，您应该将基于时间的列名称指定为＆＃34; time＆＃34;反正。

Answer 2

另一条评论。 CQL中的COPY添加了WITH HEADER = TRUE，这将导致忽略CSV文件的标题行（第一行）。（http://docs.datastax.com/en/cql/3.3/cql/cql_reference/copy_r.html）

“time”不是CQL中的保留字（相信我，因为我自己更新了DataStax文档中的CQL保留字）。但是，您确实在列名称“time”周围的COPY命令中显示列名称之间的空格，我认为这是问题所在。没有空格，只有逗号;在所有行的CSV文件中执行相同操作。（http://docs.datastax.com/en/cql/3.3/cql/cql_reference/keywords_r.html）

Answer 3

在CSV文件中有两件事困扰cqlsh：

删除尾随|在每个CSV行的末尾
从时间值中删除微秒（精度最多应为毫秒）。

在Cassandra中从CSV导入时，表中没有插入行

3 个答案: