Spark无法读取整行Hbase的数据,只读取最后一个属性的值

时间:2018-05-02 12:00:39

标签: python apache-spark pyspark hbase

为什么我无法在终端中获取完整的Hbase数据

host = 'localhost'
table = 'student'
conf = {"hbase.zookeeper.quorum": host, "hbase.mapreduce.inputtable": table}
keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter"
valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter"
hbase_rdd = sc.newAPIHadoopRDD("org.apache.hadoop.hbase.mapreduce.TableInputFormat","org.apache.hadoop.hbase.io.ImmutableBytesWritable","org.apache.hadoop.hbase.client.Result",keyConverter=keyConv,valueConverter=valueConv,conf=conf)
hbase_rdd.collect()
[('1', '23'), ('2', '24'), ('3', '10')]

Hbase中的原始数据如下:

ROW                   COLUMN+CELL                                               
1                    column=info:age, timestamp=1525153512915, value=23        
1                    column=info:gender, timestamp=1525153501730, value=F      
1                    column=info:name, timestamp=1525153481472, value=lihuan   
2                    column=info:age, timestamp=1525153553378, value=24        
2                    column=info:gender, timestamp=1525153542869, value=F      
2                    column=info:name, timestamp=1525153531737, value=sunzhesi 
3                    column=info:age, timestamp=1525157971696, value=10        
3                    column=info:gender, timestamp=1525157958967, value=M      
3                    column=info:name, timestamp=1525157941132, value=axin

系统环境:Ubuntu16.04; Python3.5.2; Spark 2.3.0; Hadoop2.9.0; Hbase1.4.2

1 个答案:

答案 0 :(得分:0)

我实际上不确定当您使用newAPIHadoopRDD时会发生什么,但是当我尝试从Hbase扫描数据时,我将“hbase.mapreduce.scan”添加到conf。所以也许尝试添加这样的东西:

$.ajax({
url: 'your-url',
success: function(data) {
    //it works, do something with the data
},
error: function() {
    //something went wrong, handle the error and display a message
}