从Intellij中的Spark程序连接到远程hive Metastore

时间:2017-12-10 18:51:24

标签: scala apache-spark intellij-idea

首先让我们创建一个hive-enabled火花会话:

val spark = SparkSession.builder.config(conf).enableHiveSupport.getOrCreate

然后让我们尝试连接到远程dB:

spark.sql("use my_remote_db").show

17/12/10 10:27:02 WARN ObjectStore: Failed to get database my_remote_db, returning NoSuchObjectException
Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'my_remote_db' not found;
    at org.apache.spark.sql.catalyst.catalog.SessionCatalog.org$apache$spark$sql$catalyst$catalog$SessionCatalog$$requireDbExists(SessionCatalog.scala:173)
    at org.apache.spark.sql.catalyst.catalog.SessionCatalog.setCurrentDatabase(SessionCatalog.scala:268)
    at org.apache.spark.sql.execution.command.SetDatabaseCommand.run(databases.scala:59)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:182)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:67)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
    at com.mycompany.sbseg.graph.poc.features.LoadGraphData$.loadGraphData(LoadGraphData.scala:8)

注意:相同的代码 适用于spark-shell

以下是Intellij中用于模拟用于运行bash的{​​{1}} shell环境的其他设置:

enter image description here

为了确保它们设置正确,它们会打印出来:

spark-shell

这些打印出与命令行相同/正确的结果。

Seq("SPARK_HOME","HIVE_CONF_DIR","HIVE_HOME")
 .foreach{ s=>println(s"$s:${System.getenv(s)}")}

因此,不清楚Intellij和bash环境之间可能存在的差异 - 以及代码在前者中无法正常工作的原因。

0 个答案:

没有答案