尝试将数据插入Hbase时遇到问题。我在Google Cloud Spark shell上运行scala代码并尝试将数据从RDD插入Hbase(BigTable)
hbaseRDD的格式: - RDD [(String,Map [String,String])]
字符串是行ID,地图包含相应的列及其值。
代码是这样的: -
val tableName: String = "omniture";
val connection = BigtableConfiguration.connect("*******", "**********")
val admin = connection.getAdmin();
val table = connection.getTable(TableName.valueOf(tableName));
TRY 1 :
hbaseRDD.foreach{w =>
val put = new Put(Bytes.toBytes(w._1));
var ColumnValue = w._2
ColumnValue.foreach{x =>
put.addColumn(Bytes.toBytes("u"), Bytes.toBytes(x._1 ), Bytes.toBytes(x._2));
}
table.put(put);
}
TRY 2 :
hbaseRDD.map{w =>
val put = new Put(Bytes.toBytes(w._1));
var ColumnValue = w._2
ColumnValue.map{x =>
put.addColumn(Bytes.toBytes("u"), Bytes.toBytes(x._1 ), Bytes.toBytes(x._2));
}
table.put(put);
}
贝娄是我得到的错误: -
org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: com.google.cloud.bigtable.hbase.BigtableTable
Serialization stack:
- object not serializable (class: com.google.cloud.bigtable.hbase.BigtableTable, value: BigtableTable{hashCode=0x7d96618, project=cdp-dev-201706-01, instance=cdp-dev-cl-hbase-instance, table=omniture, host=bigtable.googleapis.com})
- field (class: logic.ingestion.Ingestion$$anonfun$insertTransactionData$1, name: table$1, type: interface org.apache.hadoop.hbase.client.Table)
- object (class logic.ingestion.Ingestion$$anonfun$insertTransactionData$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
... 27 more
任何帮助将不胜感激。提前谢谢。
答案 0 :(得分:0)
参考来自: - Writing to HBase via Spark: Task not serializable
贝娄是正确的方法: -
hbaseRDD.foreachPartition {w =>
val tableName: String = "omniture";
val connection = BigtableConfiguration.connect("cdp-dev-201706-01", "cdp-dev-cl-hbase-instance")
val admin = connection.getAdmin();
val table = connection.getTable(TableName.valueOf(tableName));
w.foreach {f=>
var put = new Put(Bytes.toBytes(f._1))
var ColumnValue = f._2
ColumnValue.foreach{x =>
put.addColumn(Bytes.toBytes("u"), Bytes.toBytes(x._1 ), Bytes.toBytes(x._2));
}
table.put(put);
}
}
hbaseRDD.collect();
以上链接
详细说明了详情