我已经成功安装了Python和Anaconda,但是在配置pyspark时遇到了问题。
Python 3.7.6 (default, Jan 8 2020, 20:23:39)
[MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
20/05/24 13:33:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.0.0-preview2
/_/
Using Python version 3.7.6 (default, Jan 8 2020 20:23:39)
SparkSession available as 'spark'.
>>> 20/05/24 13:34:05 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped.
任何人都可以帮我解决这个问题,或者提供博客来为Jupyter笔记本配置pyspark。
答案 0 :(得分:0)
您可以尝试在Windows中设置以下环境变量
PYSPARK_DRIVER_PYTHON_OPTS作为笔记本
PYSPARK_DRIVER_PYTHON为jupyter
一旦设置了这些环境变量,然后在cmd中键入pyspark时,它将直接打开配置了pyspark的jupyter笔记本。