Spark 2.3的新pyspark.ml.image
功能遇到问题。
在“本地计算”中使用ImageSchema.toNDArray()
时,可以。但是在rdd.map()
中使用它会引发错误,
AttributeError:“ NoneType”对象没有属性“ _jvm”。
您可以在pyspark中尝试以下代码,并在文件夹“ jpg”中准备图片。例如,我将this single picture放入其中。
在“本地计算”中可以:
>>> from pyspark.ml.image import ImageSchema
>>> df = ImageSchema.readImages("jpg")
>>> row = df.collect()[0] # collect() to a "local" list and take the first
>>> ImageSchema.toNDArray(row.image) # so this toNDArray() is a "local computation"
array([[[228, 141, 97],
[229, 142, 98],
[229, 142, 98],
...,
[239, 157, 110],
[239, 157, 110],
[239, 157, 109]],
...
...
[[ 66, 38, 21],
[ 66, 38, 21],
[ 66, 38, 21],
...,
[ 91, 55, 37],
[ 94, 57, 37],
[ 94, 57, 37]]], dtype=uint8)
但是如果我将其放在rdd.map()
中,它将引发
AttributeError:'NoneType'对象没有属性'_jvm'
>>> from pyspark.ml.image import ImageSchema
>>> df = ImageSchema.readImages("jpg")
>>> df.rdd.map(lambda row: ImageSchema.toNDArray(row.image)).take(1)
...
...
File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/lib/pyspark.zip/pyspark/ml/image.py", line 123, in toNDArray
if any(not hasattr(image, f) for f in self.imageFields):
File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/lib/pyspark.zip/pyspark/ml/image.py", line 90, in imageFields
if self._imageFields is None:
ctx = SparkContext._active_spark_context
self._imageFields = list(ctx._jvm.org.apache.spark.ml.image.ImageSchema.imageFields())
AttributeError: 'NoneType' object has no attribute '_jvm'
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:298)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:438)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:421)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:252)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
...
...
这种情况已经过测试并且可以重现
Spark 2.3.0 provided by Cloudera parcel
Spark 2.3.0 on Hortonworks
Spark 2.3.0 on Windows with WinUtils
Spark 2.3.1 on Windows with WinUtils
怎么了?
我该如何解决?
答案 0 :(得分:0)
我认为这是pyspark.ml.image
的错误,因为如果像下面这样修改.../lib/spark2/python/pyspark/ml/image.py
中的所有行
来自
ctx = SparkContext._active_spark_context
进入
ctx = SparkContext.getOrCreate()
然后一切正常。
但是,我不是pyspark的专家。我认为在选择答案之前,最好让它进行讨论。
P.S。我并不是说应该以这种方式纠正它。我只是认为这可能是一个错误。