从使用火花与scala的蜂巢获取空表

时间:2017-06-19 05:51:46

标签: scala apache-spark hive

我想使用spark来编写来自hive服务器的数据帧的scala代码。我使用了以下代码 -

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.security.UserGroupInformation
import scala.util.Properties
import org.apache.spark.sql.SparkSession

val configuration = new Configuration
configuration.set("hadoop.security.authentication", "Kerberos")
Properties.setProp("java.security.krb5.conf", krb5LocationInMySystem)  
UserGroupInformation.setConfiguration(configuration)   
UserGroupInformation.loginUserFromKeytab(principal,keytabLocation)

val spSession = SparkSession.builder().config("spark.master","local").config("spark.sql.warehouse.dir", "file:/Users/username/IdeaProjects/project_name/spark-warehouse/").enableHiveSupport().getOrCreate()
spSession.read.format("jdbc")
.option("url","jdbc:hive2://host:port/default;principal=hive/host@realm.com")
.option("driver", "org.apache.hive.jdbc.HiveDriver")
.option("dbtable", "tablename").load().show()

获得类似

的输出
column1|column2|column3....

(只有这么多输出)

在跑步时,程序第一次等待说:

Will try to open client transport with JDBC Uri:(url)
Code generated in 159.970292 ms

然后一些线......然后再说:

will try to open client transport with JDBC Uri:(url)
INFO JDBCRDD: closed connection

它给出了空表

我搜索过了 -

Spark SQL RDD loads in pyspark but not in spark-submit: "JDBCRDD: closed connection"

Hive creates empty table, even there're plenty of file

Hive Table returning empty result set on all queries

但要么他们不解释我想要什么,要么我无法理解他们在说什么。对于第二个链接,我试过但找不到如何在scala中使用setInputPathFilter

依赖关系:

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.1.1</version>
    </dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.11.8</version>
    </dependency>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-compiler</artifactId>
        <version>2.11.8</version>
    </dependency>
 <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>2.1.1</version>
    </dependency>

0 个答案:

没有答案