通过Eclipse在Hive上使用Spark

时间:2016-02-22 04:59:50

标签: hadoop jdbc apache-spark hive apache-spark-sql

我们使用JDBC API通过Eclipse程序连接到HIVE来访问hive表,下面是代码:

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.Statement;
import org.testng.annotations.Test;

public class FetchHiveData_test {

private static String driverName = "org.apache.hive.jdbc.HiveDriver";

  @Test
  public void  FetchHiveDataMethod() {
  ResultSet hiveres=null;
  try {
      System.out.println("Inside Hive Method");
      Class.forName(driverName);

      /*********** Hive ***************/
    Connection con = DriverManager.getConnection("jdbc:hive2://XXXXX:20000", "xxxxxx", "yyyyy");
    Statement stmt = con.createStatement();
    String sql="select count(*) from table";
    hiveres = stmt.executeQuery(sql);

    ResultSetMetaData rsmd = hiveres.getMetaData();
    int numCols = rsmd.getColumnCount();
   for(int j=1;j<=numCols;j++){
        System.out.print(rsmd.getColumnName(j)+" ");
    }

    while (hiveres.next()) {

        //Print one row          
        for(int i = 1 ; i <= numCols; i++){

              System.out.print(hiveres.getString(i) + " "); //Print one element of a row

        }

          System.out.println();//Move to the next line to print the next row.           

            }

  } catch (Exception e) {

  e.printStackTrace();

}

}}

方法here利用了spark上下文,但不确定服务器名称和凭据的位置,如何修改上述程序以利用spark?这是为了使我们的查询运行得更快,因为JDBC API有点慢。

Spark代码:

import java.util.List; 
import org.apache.spark.SparkConf; 
import org.apache.spark.api.java.JavaRDD; 
import org.apache.spark.api.java.JavaSparkContext; 
import org.apache.spark.api.java.function.Function; 
import org.apache.spark.sql.api.java.JavaSchemaRDD; 
import org.apache.spark.sql.api.java.Row; 
import org.apache.spark.sql.hive.api.java.JavaHiveContext;
import org.apache.spark.rdd.*;
import org.testng.annotations.Test; 

public class SparkTest {

@SuppressWarnings("serial")
@Test
  public void f() {
    final SparkConf sparkConf = new SparkConf().setMaster("xxxxx:20000").setAppName("HiveConnector");
    final JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
    JavaHiveContext hiveCtx = new JavaHiveContext(sparkContext); 
    JavaSchemaRDD rdd = hiveCtx.sql("Select count(*) from table"); 
        JavaRDD<Integer> keys = rdd.map(new Function<Row, Integer>() { 
            public Integer call(Row row) { return row.getInt(0); } 
            }); 
    List<Integer> res= keys.collect(); 
    for(Integer val:res){ 
            System.out.println("val "+val); 
   } 
    }
}

1 个答案:

答案 0 :(得分:0)

尝试跑步 http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server您可以通过jdbc访问它,就像在hive中一样。

注意:这可能不支持所有hiveQl。