Question

我正在使用带Spyglass的Scalding来读取/写入HBase。

我正在执行table1和table2的左外连接，并在转换列后写回table1。 table1和table2都声明为Spyglass HBaseSource。

这很好用。但是，我需要使用rowkey访问table1中的另一行来计算转换值。

我为HBase get尝试了以下内容： val hTable = new HTable(conf, TABLE_NAME) val result = hTable.get(new Get(rowKey.getBytes()))

我可以访问此链接中提到的Scalding中的配置：

https://github.com/twitter/scalding/wiki/Frequently-asked-questions#how-do-i-access-the-jobconf

当我在本地运行烫手工作时，这是有效的。但是，当我在集群中运行它时，当在Reducer中执行此代码时，conf为null。

对于像这样的情况，是否有更好的方法在烫染/级联作业中进行HBase get / scan？

Answer 1

如何做到这一点......

1）您可以使用托管资源

class SomeJob(args: Args) extends Job(args) {      
  val someConfig = HBaseConfiguration.create().addResource(new Path(pathtoyourxmlfile))
  lazy val hPool = new HTablePool(someConfig, 3)

  def getConf = {
    implicitly[Mode] match {
      case Hdfs(_, conf) => conf
      case _ => whateveryou are doing for a local conf...
    }
  }
  ... somePipe.someOperation.... {
        val gets = key.map { key => new Get(key) }
        managed(hPool.getTable("myTableName")) acquireAndGet { table => 
          val results = table.get(gets)
          ...do something with these results
        }
     }    
}

2）您可以使用一些更具体的级联代码，您可以在其中编写自定义方案，并在其中根据您的需要覆盖源方法以及可能的其他方法。在那里你可以像这样访问JobConf：

class MyScheme extends Scheme[JobConf, SomeRecordReader, SomeOutputCollector, ..] {

  @transient var jobConf: Configuration = super.jobConfiguration

  override def source(flowProcess: FlowProcess[JobConf], ...): Boolean = {
   jobConf = flowProcess match {
     case h: HadoopFlowProcess => h.getJobConf
     case _ => jconf
   }

   ... dosomething with the jobConf here

 }   

}

HBase获取/扫描烫伤工作

1 个答案: