我遵循this tutorial用Apache Spark对图像进行分类。这篇文章似乎有些过时了,但是过程应该非常相似。它也说Python 2.7,但据我所知,sparkdl仅适用于python 3。。。所以我使用通过自制软件安装的Python 3.6。我也从pip安装了所有最新的软件包。在运行时,它发出了很多弃用警告,但我认为这不是问题所在。我应该怎么做才能进一步调试呢?我在Macbook上使用Python 3.6.0和Spark 2.4.0。我感觉py4j不喜欢安装的其他软件包的较新版本。这是没有弃用警告的控制台输出。可以找到源代码here。
Using Python version 3.6.0 (v3.6.0:41df79263a11, Dec 22 2016 17:23:13)
SparkSession available as 'spark'.
>>> exec(open("imageImporter.py").read())
Using TensorFlow backend.
>>> exec(open("modelCreator.py").read())
INFO:tensorflow:Froze 376 variables.
2019-03-04 18:11:59,059 INFO (MainThread-87989) Froze 376 variables.
INFO:tensorflow:Converted 376 variables to const ops.
2019-03-04 18:11:59,434 INFO (MainThread-87989) Converted 376 variables to const ops.
[Stage 2:> (0 + 1) / 1]Using TensorFlow backend.
Using TensorFlow backend.
INFO:tensorflow:Froze 0 variables.
2019-03-04 18:12:19,950 INFO (MainThread-87989) Froze 0 variables.
INFO:tensorflow:Converted 0 variables to const ops.
2019-03-04 18:12:20,129 INFO (MainThread-87989) Converted 0 variables to const ops.
2019-03-04 18:12:20,966 INFO (MainThread-87989) Fetch names: ['sdl_flattened_mixed10/concat:0']
2019-03-04 18:12:20,966 INFO (MainThread-87989) Spark context = <SparkContext master=local[*] appName=PySparkShell>
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 8, in <module>
File "/usr/local/Cellar/apache-spark/2.4.0/libexec/python/pyspark/ml/base.py", line 132, in fit
return self._fit(dataset)
File "/usr/local/Cellar/apache-spark/2.4.0/libexec/python/pyspark/ml/pipeline.py", line 107, in _fit
dataset = stage.transform(dataset)
File "/usr/local/Cellar/apache-spark/2.4.0/libexec/python/pyspark/ml/base.py", line 173, in transform
return self._transform(dataset)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sparkdl/transformers/named_image.py", line 158, in _transform
return transformer.transform(dataset)
File "/usr/local/Cellar/apache- spark/2.4.0/libexec/python/pyspark/ml/base.py", line 173, in transform
return self._transform(dataset)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sparkdl/transformers/named_image.py", line 221, in _transform
result = tfTransformer.transform(dataset.withColumn(resizedCol, resizeUdf(inputCol)))
File "/usr/local/Cellar/apache-spark/2.4.0/libexec/python/pyspark/ml/base.py", line 173, in transform
return self._transform(dataset)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sparkdl/transformers/tf_image.py", line 137, in _transform
"image_buffer": "__sdl_image_data"})
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorframes/core.py", line 264, in map_rows
return _map(fetches, dframe, feed_dict, block=False, trim=None, initial_variables=initial_variables)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorframes/core.py", line 150, in _map
builder = _java_api().map_rows(dframe._jdf)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorframes/core.py", line 34, in _java_api
return _jvm.Thread.currentThread().getContextClassLoader().loadClass(javaClassName) \
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/py4j/java_gateway.py", line 1286, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/local/Cellar/apache-spark/2.4.0/libexec/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o162.loadClass.
: java.lang.ClassNotFoundException: org.tensorframes.impl.DebugRowOps
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
我以前曾用过旧版本的sparkdl,py4j和tensorflow尝试过同样的事情,但得到的错误却非常相似。我也尝试过使用python 2.7,但是它甚至还没有走得那么远,我想让所有内容尽可能保持最新。