Question

我正在尝试编写map-reduce作业，用于计算Hive表（Hadoop 2.2.0.2.0.6.0-101）中字段值的分布。例如：

输入Hive表“ATable”：

+------+--------+
! name | rating |   |
+------+--------+
| Bond |  7     |
| Megre|  2     |
! Holms|  11    |
| Puaro|  7     |
! Holms|  1     |
| Puaro|  7     |
| Megre|  2     |      
| Puaro|  7     |
+------+--------+

Map-reduce作业也应该在Hive中生成以下输出表：

+--------+-------+--------+
| Field  | Value |  Count |
+--------+-------+--------+
| name   | Bond  |   1    |
| name   | Puaro |   3    |
| name   | Megre |   2    |
| name   | Holms |   1    |
| rating | 7     |   4    |
| rating | 11    |   1    |
| rating | 1     |   1    |
| rating | 2     |   2    |
+--------+-------+--------+

要获取字段名称/值，我需要访问HCatalog元数据，所以我可以在map方法中使用它们（org.apache.hadoop.mapreduce.Mapper）为此，我试图采用以下示例： http://java.dzone.com/articles/mapreduce-hive-tables-using

此示例中的代码编译但产生了大量弃用警告：

protected void map(WritableComparable key, HCatRecord value,
 org.apache.hadoop.mapreduce.Mapper.Context context)
 throws IOException, InterruptedException {

 // Get table schema
 HCatSchema schema = HCatBaseInputFormat.getTableSchema(context);

 Integer year = new Integer(value.getString("year", schema));
 Integer month = new Integer(value.getString("month", schema));
 Integer DayofMonth = value.getInteger("dayofmonth", schema);

 context.write(new IntWritable(month), new IntWritable(DayofMonth));
}

弃用警告：

HCatRecord
HCatSchema 
HCatBaseInputFormat.getTableSchema

在哪里可以找到在map-reduce中使用HCatalog的类似示例，其中包含最新的，未弃用的接口？

谢谢！

Answer 1

我使用Cloudera examples之一中给出的示例，并使用this blog给出的框架来编译我的代码。我还必须在pom.xml中添加hcatalog的maven repo。此示例使用新的mapreduce API，而不是已弃用的mapred API。希望它有所帮助。

        <dependency>
        <groupId>org.apache.hcatalog</groupId>
        <artifactId>hcatalog-core</artifactId>
        <version>0.11.0</version>
        </dependency>

使用HCatalog在Hive表上的MapReduce

1 个答案: