运行pyspark
1.6.X时,它就好了。
17/02/25 17:35:41 INFO storage.BlockManagerMaster: Registered BlockManager
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.6.1
/_/
Using Python version 2.7.13 (default, Dec 17 2016 23:03:43)
SparkContext available as sc, SQLContext available as sqlContext.
>>>
但在我重置SPARK_HOME
,PYTHONPATH
和PATH
以指向spark 2.x安装后,事情就会快速向南移动
(a)我每次都必须手动删除一个derby metastore_db
。
(b)pyspark
无法启动:打印出这些不愉快的警告后,它会挂起:
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/02/25 17:32:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/25 17:32:53 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/02/25 17:32:53 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
我不需要/关心hive
功能:但在火花2.X的情况下,它们可能是必需的。 hive
使pyspark 2.X
满意的最简单的工作配置是什么?
答案 0 :(得分:1)
您是否尝试过enableHiveSupport功能?当从1.6迁移到2.x时,我遇到了DataFrames的问题,即使我没有访问Hive。在构建器上调用该函数解决了我的问题。 (您也可以将其添加到配置中。)
如果您正在使用pyspark shell来配置Spark上下文,要启用配置单元支持,您需要通过配置执行此操作。在spark-defaults.conf
尝试添加spark.sql.catalogImplementation hive
。