NullPointerException PySparkling H2OFrame到Spark DataFrame

时间:2018-02-23 15:46:59

标签: python apache-spark pyspark h2o sparkling-water

pysparkling 2.1

我运行以下代码:

hc = H2OContext.getOrCreate(spark)
h2o_frame = h2o.import_file('hdfs:path/to/my/file.csv')
spark_frame = hc.as_spark_frame(h2o_frame)

它的工作正常,就像在文档中一样。

但是当我尝试以下代码时:

hc = H2OContext.getOrCreate(spark)
h2o_frame = h2o.H2OFrame(some_list)
spark_frame = hc.as_spark_frame(h2o_frame) #error at this line

我收到以下错误:

File "/my_path/my_file.py", line 530, in _convert_and_append
spark_frame = hc.as_spark_frame(h2o_frame)  
File "/my_path/.virtualenv/lib/python2.7/site-packages/pysparkling/context.py", line 196, in as_spark_frame
j_h2o_frame = h2o_frame.get_java_h2o_frame()
File "/my_path/.virtualenv/lib/python2.7/site-packages/pysparkling/context.py", line 38, in get_java_h2o_frame
self._java_frame = hc._jhc.asH2OFrame(self.frame_id)
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o67.asH2OFrame.
 : java.lang.NullPointerException
    at water.fvec.H2OFrame.<init>(H2OFrame.scala:38)
    at water.fvec.H2OFrame.<init>(H2OFrame.scala:46)
    at org.apache.spark.h2o.H2OContext.asH2OFrame(H2OContext.scala:234)
    at org.apache.spark.h2o.JavaH2OContext.asH2OFrame(JavaH2OContext.java:111)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:280)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:745)

唯一的区别是我初始化H2OFrame的方式。造成这种差异的原因是什么?有什么我想念的吗?为什么H2OFrame的创建方式很重要?

任何帮助表示赞赏

修改

some_list

<type 'list'>
[(u'petal_length', 15800.0, 1.0, 0.42857966682031484), (u'petal_width', 14200.0, 0.8987341772151899, 0.3851791942309159), (u'sepal_length', 4808.2783203125, 0.30432141267800633, 0.13042596965182748), (u'sepal_width', 2057.6796875, 0.13023289161392404, 0.05581516929694175), (u'id', 0.0, 0.0, 0.0)]

0 个答案:

没有答案