我想安排Oozie的Hbase Map-Reduce工作。我正面临以下问题。
How/Where to specify these properties in oozie workflow ?
( i> Table name for Mapper/Reducer
ii> scan object for Mapper )
Scan scan = new Scan(new Get());
scan.setMaxVersions();
scan.addColumn(Bytes.toBytes(FAMILY),
Bytes.toBytes(VALUE));
scan.addColumn(Bytes.toBytes(FAMILY),
Bytes.toBytes(DATE));
Job job = new Job(conf, JOB_NAME + "_" + TABLE_USER);
// These two properties :-
TableMapReduceUtil.initTableMapperJob(TABLE_USER, scan,
Mapper.class, Text.class, Text.class, job);
TableMapReduceUtil.initTableReducerJob(DETAILS_TABLE,
Reducer.class, job);
或
please let me know the best way to schedule a Hbase Map-Reduce Job by Oozie .
谢谢:) :)
答案 0 :(得分:3)
安排Hbase Map_Reduce作业的最佳方式(据我所知)是将其安排为.java文件。 效果很好,无需编写代码即可将扫描更改为字符串等。 所以我安排我的工作就像java文件,直到我得到更好的选择。
workflow-app xmlns="uri:oozie:workflow:0.1" name="java-main-wf">
<start to="java-node"/>
<action name="java-node">
<java>
<job-tracker></job-tracker>
<name-node></name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<main-class>org.apache.oozie.example.DemoJavaMain</main-class>
<arg>Hello</arg>
<arg>Oozie!</arg>
<arg>This</arg>
<arg>is</arg>
<arg>Demo</arg>
<arg>Oozie!</arg>
</java>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
答案 1 :(得分:0)
您还可以使用<Map-reduce>
标记来安排作业,但这并不像将其安排为java文件那么容易。这需要相当大的努力,但可以被视为一种替代方法。
<action name='jobSample'>
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<!-- This is required for new api usage -->
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<!-- HBASE CONFIGURATIONS -->
<property>
<name>hbase.mapreduce.inputtable</name>
<value>TABLE_USER</value>
</property>
<property>
<name>hbase.mapreduce.scan</name>
<value>${wf:actionData('get-scanner')['scan']}</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>${hbaseZookeeperClientPort}</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>${hbaseZookeeperQuorum}</value>
</property>
<!-- MAPPER CONFIGURATIONS -->
<property>
<name>mapreduce.inputformat.class</name>
<value>org.apache.hadoop.hbase.mapreduce.TableInputFormat</value>
</property>
<property>
<name>mapred.mapoutput.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.mapoutput.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>com.hbase.mapper.MyTableMapper</value>
</property>
<!-- REDUCER CONFIGURATIONS -->
<property>
<name>mapreduce.reduce.class</name>
<value>com.hbase.reducer.MyTableReducer</value>
</property>
<property>
<name>hbase.mapred.outputtable</name>
<value>DETAILS_TABLE</value>
</property>
<property>
<name>mapreduce.outputformat.class</name>
<value>org.apache.hadoop.hbase.mapreduce.TableOutputFormat</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>${mapperCount}</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>${reducerCount}</value>
</property>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
</map-reduce>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Map/Reduce failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
要了解有关属性名称和值的更多信息,请转储configration参数。 此外,扫描属性是扫描信息的一些序列化(Base 64编码版本),因此不确定如何指定 -
scan.addColumn(Bytes.toBytes(FAMILY),
Bytes.toBytes(VALUE));