Question

我正在尝试在本地运行的pyspark中的交互式shell中创建udf，但无法：

Using Python version 3.5.1 (v3.5.1:37a07cee5969, Dec  6 2015 01:38:48)
SparkContext available as sc, HiveContext available as sqlContext.

>>> from pyspark.sql.functions import udf
>>> from pyspark.sql.types import StringType
>>> udf_func = udf(lambda x: 'hello', StringType())

在上面之后，出现以下错误....

16/05/16 22:03:19 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/05/16 22:03:20 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/05/16 22:03:25 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is no
t enabled so recording the schema version 1.2.0
16/05/16 22:03:26 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Traceback (most recent call last):
File "C:\spark-1.6.0-bin-hadoop2.6\python\pyspark\sql\context.py", line 686, in _ssql_ctx
self._scala_HiveContext = self._get_hive_ctx()
File "C:\spark-1.6.0-bin-hadoop2.6\python\pyspark\sql\context.py", line 694, in _get_hive_ctx
return self._jvm.HiveContext(self._jsc.sc())
File "C:\spark-1.6.0-bin-hadoop2.6\python\lib\py4j-0.9-src.zip\py4j\java_gateway.py", line 1064, in __call__
File "C:\spark-1.6.0-bin-hadoop2.6\python\pyspark\sql\utils.py", line 45, in deco
return f(*a, **kw)
File "C:\spark-1.6.0-bin-hadoop2.6\python\lib\py4j-0.9-src.zip\py4j\protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.
: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Cu
rrent permissions are: rw-rw-rw-
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
    at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
    at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
    at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
    at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
    at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
    at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
    at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
    at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
    at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
    at java.lang.reflect.Constructor.newInstance(Unknown Source)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
    at py4j.Gateway.invoke(Gateway.java:214)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
    at py4j.GatewayConnection.run(GatewayConnection.java:209)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions a
re: rw-rw-rw-
    at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
    ... 21 more


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\spark-1.6.0-bin-hadoop2.6\python\pyspark\sql\functions.py", line 1597, in udf
return UserDefinedFunction(f, returnType)
File "C:\spark-1.6.0-bin-hadoop2.6\python\pyspark\sql\functions.py", line 1558, in __init__
self._judf = self._create_judf(name)
File "C:\spark-1.6.0-bin-hadoop2.6\python\pyspark\sql\functions.py", line 1569, in _create_judf
jdt = ctx._ssql_ctx.parseDataType(self.returnType.json())
File "C:\spark-1.6.0-bin-hadoop2.6\python\pyspark\sql\context.py", line 691, in _ssql_ctx
"build/sbt assembly", e)
Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError('An err
or occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o21))

知道出了什么问题吗？我已经检查了这个链接：Multiple Spark applications with HiveContext。它表明这个问题是因为运行多个spark应用程序而发生的。我没有运行多个火花应用程序。我正在运行这个交互式shell。

无法在交互式shell中的pyspark中创建udf

0 个答案: