我正在学习有关Spark的多元化课程。
我已经下载了pre-built Spark 2.4.3 for Hadoop 2.7
,并且正在笔记本电脑上以独立模式使用它。我的笔记本电脑上没有Hadoop或Hive。
其中一课是关于SparkSQL的,代码如下所示:
record = sc.parallelize([Row(id=1,
name='Jill',
active = True,
clubs = ['chess', 'hockey'],
subjects = {'math': 80, 'english': 56},
enrolled = datetime(2014,8,1,14,1,5)),
Row(id=2,
name='George',
active = False,
clubs = ['chess', 'soccer'],
subjects = {'math': 60, 'english': 96},
enrolled = datetime(2015,3,21,8,2,5))
])
record_df = record.toDF()
record_df.createOrReplaceTempView('records')
sqlContext.sql('select * from records')
最后一条语句给我以下错误:
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
~/software/dev/spark-2.4.3-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
~/software/dev/spark-2.4.3-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
Py4JJavaError: An error occurred while calling o25.sql.
: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
....
您能帮我吗?我需要Hive和Hadoop来完成这项工作吗(在视频中,讲师没有使用其中的任何一个)
P.S。我对这个话题很陌生