HBase Spark Connector

时间:2016-06-28 06:08:39

标签: apache-spark hbase

hbase(main):011:0> scan 'iemployee'
ROW       COLUMN+CELL    

 1        column=insurance:dental, timestamp=1457693003758, value=metlife                                                                                        
 1        column=insurance:health, timestamp=1457693003727, value=anthem                                                                                         
 1        column=insurance:life, timestamp=1457693003814, value=metlife                                                                                          
 1        column=insurance:vision, timestamp=1457693003786, value=visionOne                                                                                      
 1        column=payroll:grade, timestamp=1457693003619, value=G16                                                                                               
 1        column=payroll:salary, timestamp=1457693003647, value=250000.00                                                                                        
 1        column=personal:city, timestamp=1457693003536, value=San Fransisco                                                                                     
 1        column=personal:fname, timestamp=1457693003430, value=Mike                                                                                             
 1        column=personal:lname, timestamp=1457693003503, value=Young                                                                                            
 1        column=personal:zip, timestamp=1457693003590, value=12345                                                                                              
 1        column=skills:interpersonal-rating, timestamp=1457693003694, value=medium                                                                              
 1        column=skills:management, timestamp=1457693003669, value=executive,creator,innovative

我从表iemployee(这是HortonWorks HBase附带的默认表)中获得了这些数据,我正在尝试使用spark查询上表中的数据。

我在spark中有以下代码:

import org.apache.hadoop.hbase.client.{HBaseAdmin, Result}
import org.apache.hadoop.hbase.{ HBaseConfiguration, HTableDescriptor }
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.io.ImmutableBytesWritable

import org.apache.spark._

object HbaseTest {
        def main(args: Array[String]) {
                val sparkConf = new SparkConf().setAppName("Spark-Hbase").setMaster("local[2]")
                val sc = new SparkContext(sparkConf)

                //This anonymous function is called in the newAPIHadoopRDD method below
                val hBaseCongig = (hbaseSiteXml: String, tableName: String) => {
                        val hbaseConfiguration = HBaseConfiguration.create()
                        hbaseConfiguration.addResource(hbaseSiteXml)
                        hbaseConfiguration.set(TableInputFormat.INPUT_TABLE, tableName)
                        hbaseConfiguration
                }

                val rdd = sc.newAPIHadoopRDD(
                                hBaseCongig("/etc/ams-hbase/conf/hbase-site.xml", "iemployee"),
                                classOf[TableInputFormat],
                                classOf[ImmutableBytesWritable],
                                classOf[Result]
                )

                rdd.map(result => result._2).map(result => result.getColumn("insurance".getBytes(), "dental".getBytes())).getValue
        }
}

问题:

我传递给hbaseSiteXml方法中的匿名函数hBaseCongig的{​​{1}}参数,newAPIHadoopRDD的路径位于我使用的另一台服务器{ {1}}。有没有办法可以指定我在参数中传递给hbase-site.xml方法。这样的事情:ssh root@92.168.21.128

0 个答案:

没有答案