我正在使用带有两个节点的apache spark版本2.0.0的独立群集,我没有安装hive。我在创建数据帧时遇到以下错误。
from pyspark import SparkContext
from pyspark import SQLContext
sqlContext = SQLContext(sc)
l = [('Alice', 1)]
sqlContext.createDataFrame(l).collect()
---------------------------------------------------------------------------
IllegalArgumentException Traceback (most recent call last)
<ipython-input-9-63bc4f21f23e> in <module>()
----> 1 sqlContext.createDataFrame(l).collect()
/home/mok/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/context.pyc in createDataFrame(self, data, schema, samplingRatio)
297 Py4JJavaError: ...
298 """
--> 299 return self.sparkSession.createDataFrame(data, schema, samplingRatio)
300
301 @since(1.3)
/home/mok/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/session.pyc in createDataFrame(self, data, schema, samplingRatio)
522 rdd, schema = self._createFromLocal(map(prepare, data), schema)
523 jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd())
--> 524 jdf = self._jsparkSession.applySchemaToPythonRDD(jrdd.rdd(), schema.json())
525 df = DataFrame(jdf, self._wrapped)
526 df._schema = schema
/home/mok/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
931 answer = self.gateway_client.send_command(command)
932 return_value = get_return_value(
--> 933 answer, self.gateway_client, self.target_id, self.name)
934
935 for temp_arg in temp_args:
/home/mok/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/utils.pyc in deco(*a, **kw)
77 raise QueryExecutionException(s.split(': ', 1)[1], stackTrace)
78 if s.startswith('java.lang.IllegalArgumentException: '):
---> 79 raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
80 raise
81 return deco
IllegalArgumentException: u'Unable to locate hive jars to connect to metastore. Please set spark.sql.hive.metastore.jars.'
我应该安装Hive还是编辑配置。
答案 0 :(得分:8)
IllegalArgumentException:u&#39;无法找到连接到Metastore的hive jar。请设置spark.sql.hive.metastore.jars。&#39;
我遇到了同样的问题并使用Java 8修复了它。确保安装JDK 8并相应地设置环境变量。
不要将Java 11与Spark / pyspark 2.4一起使用。
答案 1 :(得分:2)
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home
诀窍。
答案 2 :(得分:2)
如果你有几个java版本,你将不得不弄清楚使用了哪个火花(我使用试验和错误,从
开始)JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64"
以
结尾JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
答案 3 :(得分:2)
如果安装了多个jdks,则可以找到如下所示的Java主页
REGEXEXTRACT
现在将JAVA_HOME设置为1.8使用
/usr/libexec/java_home -V
Matching Java Virtual Machines (3):
13.0.2, x86_64: "OpenJDK 13.0.2" /Library/Java/JavaVirtualMachines/adoptopenjdk-13.0.2.jdk/Contents/Home
11.0.6, x86_64: "AdoptOpenJDK 11" /Library/Java/JavaVirtualMachines/adoptopenjdk-11.jdk/Contents/Home
1.8.0_252, x86_64: "AdoptOpenJDK 8" /Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home
答案 4 :(得分:0)
请确保已设置JAVA_HOME环境变量。
对于Mac OS,我做了echo export JAVA_HOME=/Library/Java/Home >> ~/.bash_profile
然后source ~/.bash_profile
或打开〜/ .bash_profile,类型如上。