我正在尝试在Spark的MapPartitionFunction中创建hbase连接。
Caused by: java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
我尝试了以下代码
SparkConf conf = new SparkConf()
.setAppName("EnterPrise Risk Score")
.setMaster("local");
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
conf.set("spark.kryo.registrationRequired", "true");
conf.registerKryoClasses(new Class<?>[] {
Class.forName("org.apache.hadoop.conf.Configuration"),
Class.forName("org.apache.hadoop.hbase.client.Table"),
Class.forName("com.databricks.spark.avro.DefaultSource$SerializableConfiguration")});
SparkSession sparkSession = SparkSession.builder().config(conf)
.getOrCreate();
Configuration hbaseConf= HBaseConfiguration
.create(hadoopConf);
我使用sparkSession创建数据集并传递hbaseConf以创建与hbase的连接。
有没有办法连接到hbase?
答案 0 :(得分:1)
您可能隐式将HBase配置传递给这样的spark动作:
Configuration hbaseConfiguration = HBaseConfiguration.create();
sc.hadoopFile(inDirTrails, AvroInputFormat.class, AvroWrapper.class, NullWritable.class)).mapPartitions( i -> {
Connection connection = ConnectionFactory.createConnection(hbaseConfiguration)
//more valid code
});
为什么不直接在其中创建配置:
sc.hadoopFile(inDirTrails, AvroInputFormat.class, AvroWrapper.class, NullWritable.class)).mapPartitions( i -> {
Configuration hbaseConfiguration = HBaseConfiguration.create();
hbaseConfiguration.set("hbase.zookeeper.quorum", HBASE_ZOOKEEPER_QUORUM);
Connection connection = ConnectionFactory.createConnection(hbaseConfiguration)
//more valid code
});