由于蜂巢式Metastore连接问题,无法运行pyspark 2.X.

时间:2017-02-26 01:39:56

标签: apache-spark hive pyspark

运行pyspark 1.6.X时,它就好了。

17/02/25 17:35:41 INFO storage.BlockManagerMaster: Registered BlockManager
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Python version 2.7.13 (default, Dec 17 2016 23:03:43)
SparkContext available as sc, SQLContext available as sqlContext.
>>>

但在我重置SPARK_HOMEPYTHONPATHPATH以指向spark 2.x安装后,事情就会快速向南移动

(a)我每次都必须手动删除一个derby metastore_db

(b)pyspark无法启动:打印出这些不愉快的警告后,它会挂起:

[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/02/25 17:32:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/25 17:32:53 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/02/25 17:32:53 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException

我不需要/关心hive功能:但在火花2.X的情况下,它们可能是必需的hive使pyspark 2.X满意的最简单的工作配置是什么?

1 个答案:

答案 0 :(得分:1)

您是否尝试过enableHiveSupport功能?当从1.6迁移到2.x时,我遇到了DataFrames的问题,即使我没有访问Hive。在构建器上调用该函数解决了我的问题。 (您也可以将其添加到配置中。)

如果您正在使用pyspark shell来配置Spark上下文,要启用配置单元支持,您需要通过配置执行此操作。在spark-defaults.conf尝试添加spark.sql.catalogImplementation hive