有没有办法在配置单元UDF中获取数据库名称

时间:2017-04-05 05:27:25

标签: apache hadoop mapreduce hql

我正在写一个Hive UDF。

我必须得到数据库的名称(部署该功能)。然后,我需要根据数据库环境从hdfs访问一些文件。能否帮助我帮助我从Hive UDF运行HQL查询有哪些功能。

1 个答案:

答案 0 :(得分:1)

  1. 编写UDF类并准备jar文件
  2. public class MyHiveUdf extends UDF {
        public Text evaluate(String text,String dbName) {
            if(text == null) {
                return null;
            } else {
                  return new Text(dbName+"."+text);
            }
        }
    }
    
    1. 在hive查询中使用此UDF,如下所述
    2. 蜂房>使用mydb;     好     所用时间:0.454秒

      hive> ADD jar /root/MyUdf.jar;
      Added [/root/MyUdf.jar] to class path
      Added resources: [/root/MyUdf.jar]
      
      hive> create temporary function myUdfFunction as 'com.hiveudf.strmnp.MyHiveUdf';
      OK
      Time taken: 0.018 seconds
      
      hive> select myUdfFunction(username,current_database()) from users;
      Query ID = root_20170407151010_2ae29523-cd9f-4585-b334-e0b61db2c57b
      Total jobs = 1
      Launching Job 1 out of 1
      Number of reduce tasks is set to 0 since there's no reduce operator
      Starting Job = job_1491484583384_0004, Tracking URL = http://mac127:8088/proxy/application_1491484583384_0004/
      Kill Command = /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/bin/hadoop job  -kill job_1491484583384_0004
      Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
      2017-04-07 15:11:11,376 Stage-1 map = 0%,  reduce = 0%
      2017-04-07 15:11:19,766 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.12 sec
      MapReduce Total cumulative CPU time: 3 seconds 120 msec
      Ended Job = job_1491484583384_0004
      MapReduce Jobs Launched:
      Stage-Stage-1: Map: 1   Cumulative CPU: 3.12 sec   HDFS Read: 21659 HDFS Write: 381120 SUCCESS
      Total MapReduce CPU Time Spent: 3 seconds 120 msec
      OK
      
      mydb.user1
      mydb.user2
      mydb.user3
      
      Time taken: 2.137 seconds, Fetched: 3 row(s)
      hive>