将HiveServer2指向MiniMRCluster进行Hive测试

时间:2014-10-31 00:37:51

标签: jdbc hive integration-testing

我一直想为我正在开发的一些代码进行Hive集成测试。我需要的测试框架的两个主要要求:

  1. 它需要使用Cloudera版本的Hive和Hadoop (优选地,2.0.0-cdh4.7.0)
  2. 需要 所有本地 。意思是,Hadoop集群和Hive 服务器应该在测试开始时运行,运行几个查询, 在测试结束后进行拆解。
  3. 所以我将这个问题分解为三个部分:

    1. 获取HiveServer2部分的代码(我决定使用JDBC Thrift服务客户端上的连接器)
    2. 获取用于构建内存中MapReduce集群的代码(我决定使用MiniMRCluster)
    3. 设置上述(1)和(2)两者相互配合。
    4. 通过查看许多资源,我能够得到(1)。其中一些非常有用的是:

      对于(2),我在StackOverflow中遵循了这篇优秀文章:

      到目前为止,这么好。在这个时间点,我的Maven项目中的pom.xml,包括上述两个功能,看起来像这样:

      <repositories>
          <repository>
              <id>cloudera</id>
              <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
          </repository>
      </repositories>
      
      <dependencies>
          <dependency>
              <groupId>commons-io</groupId>
              <artifactId>commons-io</artifactId>
              <version>2.1</version>
          </dependency>
          <dependency>
              <groupId>junit</groupId>
              <artifactId>junit</artifactId>
              <version>4.11</version>
          </dependency>
          <!-- START: dependencies for getting MiniMRCluster to work -->
          <dependency>
              <groupId>org.apache.hadoop</groupId>
              <artifactId>hadoop-auth</artifactId>
              <version>2.0.0-cdh4.7.0</version>
          </dependency>
          <dependency>
              <groupId>org.apache.hadoop</groupId>
              <artifactId>hadoop-test</artifactId>
              <version>2.0.0-mr1-cdh4.7.0</version>
          </dependency>
          <dependency>
              <groupId>org.apache.hadoop</groupId>
              <artifactId>hadoop-hdfs</artifactId>
              <version>2.0.0-cdh4.7.0</version>
          </dependency>
          <dependency>
              <groupId>org.apache.hadoop</groupId>
              <artifactId>hadoop-hdfs</artifactId>
              <version>2.0.0-cdh4.7.0</version>
              <classifier>tests</classifier>
          </dependency>
          <dependency>
              <groupId>org.apache.hadoop</groupId>
              <artifactId>hadoop-common</artifactId>
              <version>2.0.0-cdh4.7.0</version>
          </dependency>
          <dependency>
              <groupId>org.apache.hadoop</groupId>
              <artifactId>hadoop-common</artifactId>
              <version>2.0.0-cdh4.7.0</version>
              <classifier>tests</classifier>
          </dependency>
          <dependency>
              <groupId>org.apache.hadoop</groupId>
              <artifactId>hadoop-core</artifactId>
              <version>2.0.0-mr1-cdh4.7.0</version>
          </dependency>
          <dependency>
              <groupId>org.apache.hadoop</groupId>
              <artifactId>hadoop-core</artifactId>
              <version>2.0.0-mr1-cdh4.7.0</version>
              <classifier>tests</classifier>
          </dependency>
          <!-- END: dependencies for getting MiniMRCluster to work -->
      
          <!-- START: dependencies for getting Hive JDBC to work -->
          <dependency>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-builtins</artifactId>
              <version>${hive.version}</version>
          </dependency>
          <dependency>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-cli</artifactId>
              <version>${hive.version}</version>
          </dependency>
          <dependency>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-metastore</artifactId>
              <version>${hive.version}</version>
          </dependency>
          <dependency>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-serde</artifactId>
              <version>${hive.version}</version>
          </dependency>
          <dependency>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-common</artifactId>
              <version>${hive.version}</version>
          </dependency>
          <dependency>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-exec</artifactId>
              <version>${hive.version}</version>
          </dependency>
          <dependency>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-jdbc</artifactId>
              <version>${hive.version}</version>
          </dependency>
          <dependency>
              <groupId>org.apache.thrift</groupId>
              <artifactId>libfb303</artifactId>
              <version>0.9.1</version>
          </dependency>
          <dependency>
              <groupId>log4j</groupId>
              <artifactId>log4j</artifactId>
              <version>1.2.15</version>
          </dependency>
          <dependency>
              <groupId>org.antlr</groupId>
              <artifactId>antlr-runtime</artifactId>
              <version>3.5.1</version>
          </dependency>
          <dependency>
              <groupId>org.apache.derby</groupId>
              <artifactId>derby</artifactId>
              <version>10.10.1.1</version>
          </dependency>
          <dependency>
              <groupId>javax.jdo</groupId>
              <artifactId>jdo2-api</artifactId>
              <version>2.3-ec</version>
          </dependency>
          <dependency>
              <groupId>jpox</groupId>
              <artifactId>jpox</artifactId>
              <version>1.1.9-1</version>
          </dependency>
          <dependency>
              <groupId>jpox</groupId>
              <artifactId>jpox-rdbms</artifactId>
              <version>1.2.0-beta-5</version>
          </dependency>
          <!-- END: dependencies for getting Hive JDBC to work -->
      </dependencies>
      

      现在我在第(3)步。我尝试运行以下代码:

      @Test
          public void testHiveMiniDFSClusterIntegration() throws IOException, SQLException {
              Configuration conf = new Configuration();
      
              /* Build MiniDFSCluster */
              MiniDFSCluster miniDFS = new MiniDFSCluster.Builder(conf).build();
      
              /* Build MiniMR Cluster */
              System.setProperty("hadoop.log.dir", "/Users/nishantkelkar/IdeaProjects/" +
                      "nkelkar-incubator/hive-test/target/hive/logs");
              int numTaskTrackers = 1;
              int numTaskTrackerDirectories = 1;
              String[] racks = null;
              String[] hosts = null;
              MiniMRCluster miniMR = new MiniMRCluster(numTaskTrackers, miniDFS.getFileSystem().getUri().toString(),
                      numTaskTrackerDirectories, racks, hosts, new JobConf(conf));
      
              System.setProperty("mapred.job.tracker", miniMR.createJobConf(
                      new JobConf(conf)).get("mapred.job.tracker"));
      
              try {
                  String driverName = "org.apache.hive.jdbc.HiveDriver";
                  Class.forName(driverName);
              } catch (ClassNotFoundException e) {
                  e.printStackTrace();
                  System.exit(1);
              }
      
              Connection hiveConnection = DriverManager.getConnection(
                      "jdbc:hive2:///", "", "");
              Statement stm = hiveConnection.createStatement();
      
              // now create test tables and query them
              stm.execute("set hive.support.concurrency = false");
              stm.execute("drop table if exists test");
              stm.execute("create table if not exists test(a int, b int) row format delimited fields terminated by ' '");
              stm.execute("create table dual as select 1 as one from test");
              stm.execute("insert into table test select stack(1,4,5) AS (a,b) from dual");
              stm.execute("select * from test");
          } 
      

      我希望(3)可以通过以上方法的以下代码行解决:

          Connection hiveConnection = DriverManager.getConnection(
                  "jdbc:hive2:///", "", "");
      

      但是,我收到以下错误:

      java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
          at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:161)
          at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:150)
          at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:207)
          at com.ask.nkelkar.hive.HiveUnitTest.testHiveMiniDFSClusterIntegration(HiveUnitTest.java:54)
      

      任何人都可以让我知道我需要做什么以及我做错了什么才能让它发挥作用?

      P.S。我查看了HiveRunnerhive_test项目作为选项,但我无法使用Cloudera版本的Hadoop。

1 个答案:

答案 0 :(得分:2)

您的测试在第一个create table语句中失败。 Hive无益于抑制以下错误消息:

file:/user/hive/warehouse/test is not a directory or unable to create one

Hive正在尝试使用文件系统上不存在的默认仓库目录/user/hive/warehouse。您可以创建目录,但是为了进行测试,您可能希望覆盖默认值。例如:

import static org.apache.hadoop.hive.conf.HiveConf.ConfVars;
...
System.setProperty(ConfVars.METASTOREWAREHOUSE.toString(), "/Users/nishantkelkar/IdeaProjects/" +
            "nkelkar-incubator/hive-test/target/hive/warehouse");