hbase(main):011:0> scan 'iemployee'
ROW COLUMN+CELL
1 column=insurance:dental, timestamp=1457693003758, value=metlife
1 column=insurance:health, timestamp=1457693003727, value=anthem
1 column=insurance:life, timestamp=1457693003814, value=metlife
1 column=insurance:vision, timestamp=1457693003786, value=visionOne
1 column=payroll:grade, timestamp=1457693003619, value=G16
1 column=payroll:salary, timestamp=1457693003647, value=250000.00
1 column=personal:city, timestamp=1457693003536, value=San Fransisco
1 column=personal:fname, timestamp=1457693003430, value=Mike
1 column=personal:lname, timestamp=1457693003503, value=Young
1 column=personal:zip, timestamp=1457693003590, value=12345
1 column=skills:interpersonal-rating, timestamp=1457693003694, value=medium
1 column=skills:management, timestamp=1457693003669, value=executive,creator,innovative
我从表iemployee
(这是HortonWorks HBase
附带的默认表)中获得了这些数据,我正在尝试使用spark查询上表中的数据。
我在spark中有以下代码:
import org.apache.hadoop.hbase.client.{HBaseAdmin, Result}
import org.apache.hadoop.hbase.{ HBaseConfiguration, HTableDescriptor }
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.spark._
object HbaseTest {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("Spark-Hbase").setMaster("local[2]")
val sc = new SparkContext(sparkConf)
//This anonymous function is called in the newAPIHadoopRDD method below
val hBaseCongig = (hbaseSiteXml: String, tableName: String) => {
val hbaseConfiguration = HBaseConfiguration.create()
hbaseConfiguration.addResource(hbaseSiteXml)
hbaseConfiguration.set(TableInputFormat.INPUT_TABLE, tableName)
hbaseConfiguration
}
val rdd = sc.newAPIHadoopRDD(
hBaseCongig("/etc/ams-hbase/conf/hbase-site.xml", "iemployee"),
classOf[TableInputFormat],
classOf[ImmutableBytesWritable],
classOf[Result]
)
rdd.map(result => result._2).map(result => result.getColumn("insurance".getBytes(), "dental".getBytes())).getValue
}
}
问题:
我传递给hbaseSiteXml
方法中的匿名函数hBaseCongig
的{{1}}参数,newAPIHadoopRDD
的路径位于我使用的另一台服务器{ {1}}。有没有办法可以指定我在参数中传递给hbase-site.xml
方法。这样的事情:ssh root@92.168.21.128