我有以下Cassandra表:
CREATE TABLE segments (
b text,
s int,
c int,
PRIMARY KEY (b)
)
和以下Pig关系:
data: {b: chararray,s: long,c: long}
我从存储在PigStorage中的文件加载
data = LOAD 'some_file' as (b:chararray,s:long,c:long);
我试图将Pig关系存储到Cassandra表中失败。我试过了:
to_cassandra = FOREACH (GROUP data ALL)
GENERATE
TOTUPLE(TOTUPLE('b',data.b)),
TOTUPLE('s',data.s),
TOTUPLE('c',data.c);
STORE to_cassandra INTO
'cql://pv/segments?
output_query=UPDATE%20pv.segments%20SET%20s%3D%3F%2Cc%3D%3F'
USING CqlStorage();
其中解码输出查询为:
UPDATE pv.segments SET s=?,c=?
但我得到以下内容:
[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats -
ERROR: java.lang.ClassCastException:
org.apache.pig.data.DefaultDataBag cannot be cast to org.apache.pig.data.DataByteArray
这是一种神秘的。哪一个是违法的领域?我该如何解决这个问题?
修改
我跑了illustrate to_cassandra;
并得到了:
-----------------------------------------------------------------------------------------------------
| data | b:chararray | s:long | c:long |
-----------------------------------------------------------------------------------------------------
| | 03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB | 1 | 1 |
| | 0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG | 1 | 1 |
-----------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1-3 | group:chararray | data:bag{:tuple(b:chararray,s:long,c:long)} |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| | all | {(03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB, 1, 1), (0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG, 1, 1)} |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| to_cassandra | org.apache.pig.builtin.totuple_org.apache.pig.builtin.totuple_29_30:tuple(org.apache.pig.builtin.totuple_29:tuple(:chararray,:bag{:tuple(b:chararray)})) | org.apache.pig.builtin.totuple_31:tuple(:chararray,:bag{:tuple(s:long)}) | org.apache.pig.builtin.totuple_32:tuple(:chararray,:bag{:tuple(c:long)}) |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| | ((b, {(03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB), (0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG)})) | (s, {(1), (1)}) | (c, {(1), (1)}) |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
答案 0 :(得分:0)
您的分组存在问题,因为它为每个字段而不是单个值生成数组,这正是Cassandra所期望的。您的输出最终应如下所示:
((b, 03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB)), (s, 1), (c, 1)
...以匹配您的架构。由于输出模式直接与您的输入匹配,因此分组的目的不明确。