过去几天我发现了一个奇怪的错误,无法解决这个问题。
我正在使用pyspark并尝试将csv加载到DF [下面的代码],它会出现同样的错误:
py4j.protocol.Py4JJavaError: An error occurred while calling o10.textFile. : java.lang.reflect.InaccessibleObjectException: Unable to make field transient java.lang.Object[]
我实际上想最初将csv转换为镶木地板,甚至这会导致同样的错误。 (它成功转换为镶木地板,但是当我尝试打印此表中某些列的架构或计数时,它会失败并出现与上面相同的错误)
csv到Df的代码:
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
import os
os.environ['SPARK_HOME']="/opt/apps/spark-2.0.1-bin-hadoop2.7/"
from pyspark import SQLContext
sc = SparkContext(master='local')
sqlContext = SQLContext(sc)
Employee_rdd = sc.textFile("abc.csv").map(lambda line: line.split(","))
Employee_df = Employee_rdd.toDF()
Employee_df.show()
错误堆栈跟踪:
File "/home/v/scripts/g_s_pipe/a.py", line 14, in Employee_rdd = sc.textFile("abc.csv").map(lambda line: line.split(",")) File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/context.py", line 476, in textFile return RDD(self._jsc.textFile(name, minPartitions), self, File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o10.textFile. : java.lang.reflect.InaccessibleObjectException: Unable to make field transient java.lang.Object[] java.util.ArrayList.elementData accessible: module java.base does not "opens java.util" to unnamed module @51b63e70 at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:335) at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:278) at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:175) at java.base/java.lang.reflect.Field.setAccessible(Field.java:169)
csv to parquet的代码:
import os
os.environ['SPARK_HOME']="/opt/apps/spark-2.0.1-bin-hadoop2.7/"
from pyspark import SparkContext
from pyspark.sql import functions as F
from pyspark.sql import *
from pyspark.sql.types import *
sc = SparkContext(master='local')
sqlContext = SQLContext(sc)
df= sqlContext.read.parquet("tract_alpha.parquet")
print (df.count())
它也会出现同样的错误:
17/06/12 21:17:21 WARN BlockManager: Putting block broadcast_1 failed due to an exception 17/06/12 21:17:21 WARN BlockManager: Block broadcast_1 could not be removed as it was not found on disk or in memory Traceback (most recent call last): File "/home/vna/scripts/global_score_pipeline/test_code_here.py", line 62, in print (df.count()) File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/dataframe.py", line 299, in count return int(self._jdf.count()) File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/opt/apps/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o25.count. : java.lang.reflect.InaccessibleObjectException: Unable to make field transient java.lang.Object[] java.util.ArrayList.elementData accessible: module java.base does not "opens java.util" to unnamed module @5e37932e at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:335) at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:278) at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:175) at java.base/java.lang.reflect.Field.setAccessible(Field.java:169) at org.apache.spark.util.SizeEstimator$$anonfun$getClassInfo$3.apply(SizeEstimator.scala:336) at org.apache.spark.util.SizeEstimator$$anonfun$getClassInfo$3.apply(SizeEstimator.scala:330) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.util.SizeEstimator$.getClassInfo(SizeEstimator.scala:330)
这是什么InaccessibleObject异常,我在Google上找不到这方面的帮助,以及如何解决这个问题?