Question

我正在开发一个项目，使用 cassandra 1.2，hadoop 1.2

我已经创建了我的普通cassandra映射器和reducer，但我想创建自己的Input格式类，它将从cassandra读取记录，并通过使用拆分和索引拆分该值来获得所需列的值，所以，我打算创建自定义Format类。但我很困惑，不知道，我怎么做到的？要扩展和实现哪些类，以及如何获取行键，列名，列值等。

我的Mapperclass如下：

    public class MyMapper extends
            Mapper<ByteBuffer, SortedMap<ByteBuffer, IColumn>, Text, Text> {
        private Text word = new Text();
        MyJDBC db = new MyJDBC();

        public void map(ByteBuffer key, SortedMap<ByteBuffer, IColumn> columns,
                Context context) throws IOException, InterruptedException {

            long std_id = Long.parseLong(ByteBufferUtil.string(key));
            long newSavePoint = 0;
            if (columns.values().isEmpty()) {
            System.out.println("EMPTY ITERATOR");
            sb.append("column_N/A" + ":" + "N/A" + " , ");                  
            } else {
                for (IColumn cell : columns.values()) {
                    name = ByteBufferUtil.string(cell.name());
                    String value = null;
                    if (name.contains("int")) {
                    value = String.valueOf(ByteBufferUtil.toInt(cell.value()));
                    } else {
                    value = ByteBufferUtil.string(cell.value());
                    }
                String[] data = value.toString().split(",");
                // if (data[0].equalsIgnoreCase("login")) {
                    Long[] dif = getDateDiffe(d1, d2);

// logics i want to perform inside my custominput class , rather here, i just want a simple mapper class        
if (condition1 && condition2) {             
myhits++;
sb.append(":\t " + data[0] + "  " + data[2] + "  "+ data[1] /* + " " + data[3] */+ "\n");
newSavePoint = d2;
}
}
sb.append("~" + like + "~" + newSavePoint + "~");
word.set(sb.toString().replace("\t", ""));
}

db.setInterval(Long.parseLong(ByteBufferUtil.string(key)), newSavePoint);
db.setHits(Long.parseLong(ByteBufferUtil.string(key)), like + "");
context.write(new Text(ByteBufferUtil.string(key)), word);
}

我想减少Mapper Class逻辑，并希望在自定义输入类上执行相同的计算。

请帮助，我希望来自堆叠的积极响应......

Answer 1

您可以通过将Mapper逻辑移动到自定义输入类来完成预期的任务（正如您已经指出的那样）

我发现这个nice post解释了类似的问题陈述。我认为它可以解决你的问题。

为cassandra创建ColumnFamilyInputFormat的自定义输入格式

1 个答案: