阅读Spark中的Hive表

时间:2016-10-19 15:22:09

标签: apache-spark

如果我在配置单元表中有数十亿条记录,则以下哪种方法更好:

直接:

    SparkConf conf = new SparkConf(true).setMaster("yarn-cluster").setAppName("DCA_HIVE_HDFS");
    SparkContext sc = new SparkContext(conf);
    HiveContext hc = new HiveContext(sc);
    DataFrame df = hc.table(tableName);
    df.write().orc(outputHdfsFile);

使用JDBC:

    SparkConf conf = new SparkConf(true).setMaster("yarn-cluster").setAppName("DCA_HIVE_HDFS");
    SparkContext sc = new SparkContext(conf);
    SQLContext sqlContext = new SQLContext(sc);

    try {
        Class.forName(driverName);
    } catch (ClassNotFoundException e) {
        e.printStackTrace();
    }

    Properties props = new Properties();
    props.setProperty("user", userName);
    props.setProperty("password", password);
    props.setProperty("driver", driverName);

    DataFrame df = sqlContext.read().jdbc(connectionUri, tableName, props);
    df.write().orc(outputHdfsFile);

0 个答案:

没有答案