对于上下文-我正在从spark 2.1.1升级到2.3.1,我有一个用scala编写的自定义spark上下文,正在通过py4j / pyspark在python中使用。
我有以下在2.3.1中停止工作的python代码:
def get_imports(self):
java_jvm_view = JavaObject(DEFAULT_JVM_ID[1:], self._sparkSession._jvm._gateway_client)
imports = list(java_jvm_view.getSingleImportsMap().values()) + list(java_jvm_view.getStarImports())
return imports
def _get_ctx(self):
print("[Before Import]")
print(self.get_imports())
java_import(self._sparkSession._jvm, "com.myCompany.MyObject")
print("[After Import]")
print(self.get_imports())
print("[com.myCompany.MyObject]")
print(self._sparkSession._jvm.com.myCompany.MyObject)
self._sparkSession._jvm.com.myCompany.MyObject.apply(self._sparkSession._jsparkSession)
此输出为:
[Before Import]
scala.Tuple2
org.apache.spark.SparkConf
java.lang
org.apache.spark.api.java
org.apache.spark.api.python
org.apache.spark.ml.python
org.apache.spark.mllib.api.python
org.apache.spark.sql
org.apache.spark.sql.api.python
org.apache.spark.sql.hive
[After Import]
scala.Tuple2
com.myCompany.MyObject
org.apache.spark.SparkConf
java.lang
org.apache.spark.api.java
org.apache.spark.api.python
org.apache.spark.ml.python
org.apache.spark.mllib.api.python
org.apache.spark.sql
org.apache.spark.sql.api.python
org.apache.spark.sql.hive
[com.myCompany.MyObject]
<py4j.java_gateway.JavaPackage object at 0x7a7a7a112bd0>
当我在2.1.1上运行它时,jvm以JavaClass而不是JavaPackage的形式返回'com.myCompany.MyObject'。代码并未以应更改包结构的方式进行更改。
我试图弄清楚为什么它返回JavaPackage,我也想知道如何索引返回的包的内容,以便可以看到它实际包含的内容。