2天前,我可以运行pyspark基本操作。 现在火花上下文不可用 sc。 我尝试了多个博客,但没有任何效果。 目前我有python 3.6.6,java 1.8.0_231和apache spark(with hadoop)spark-3.0.0-preview-bin-hadoop2.7
我正在尝试在Jupyter笔记本上运行简单命令
data = sc.textfile('airline.csv')
==> getting following error.
NameError Traceback (most recent call last)
<ipython-input-2-572751a2bc2a> in <module>
----> 1 data = sc.textfile('airline.csv')
NameError: name 'sc' is not defined
我已经设置了以下系统变量集
HADOOP_HOME = C:\spark-3.0.0-preview-bin-hadoop2.7
PYSPARK_DRIVER_PYTHON = ipython
PYSPARK_DRIVER_PYTHON_OPTS = notebook
SPARK_HOME = C:\spark-3.0.0-preview-bin-hadoop2.7
(java and python system variables are already set)
path = C:\spark-3.0.0-preview-bin-hadoop2.7\bin ( i have loaded winutils.exe in this folder)
现在,如果我删除PYSPARK_DRIVER_PYTHON
和PYSPARK_DRIVER_PYTHON_OPTS
变量并在命令提示符下运行pyspark,则出现以下错误。
C:\spark-3.0.0-preview-bin-hadoop2.7>pyspark
Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 27 2018, 03:37:03) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
19/12/25 23:28:36 WARN NativeCodeLoader: **Unable to load native-hadoop library for your platform... using builtin-java classes where applicable**
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/12/25 23:28:42 WARN Utils: Service 'sparkDriver' could not bind on a random free port. You may check whether configuring an appropriate binding address.
19/12/25 23:28:42 WARN Utils: Service 'sparkDriver' could not bind on a random free port. You may check whether configuring an appropriate binding address.
19/12/25 23:28:42
我也尝试为此找到解决方法,但无法解决。请帮助
答案 0 :(得分:0)
我不知道为什么,但是这是如何工作的 我正在使用我公司的笔记本电脑。 当我使用Pulse secure连接到公司的网络时,我的Spark上下文成功连接。而当我连接到我的家庭网络时却没有。
奇怪,但这就是它对我有用的方式。
答案 1 :(得分:0)
@GiovaniSalazar 是对的。您需要导入
from pyspark.sql import SQLContext, Row, SparkSession
并定义 sc
spark = SparkSession.builder.getOrCreate()
sc = spark.sparkContext
然后参考sc
data = sc.textfile('airline.csv')
就你而言。