Sparkling Water - 将python脚本作为Spark应用程序运行

时间:2016-04-12 20:26:14

标签: python pyspark h2o sparkling-water

我在使用Sparkling Water运行python脚本作为Spark应用程序时遇到了一些麻烦。我使用此命令在Spark上执行我的脚本:

  

./ bin / spark-submit \

     

- 打包ai.h2o:sparkling-water-core_2.10:1.5.12 \

     

- py-files $ SPARKLING_HOME / py / dist / pySparkling-1.5.12-py2.7.egg $ SPARKLING_HOME / Python / test.py

我有这个下降的错误:

  

py4j.protocol.Py4JError:试图调用包。

日志:

> Traceback (most recent call last):
  File "/Users/Documents/sparkling-water-1.5.12/Python/test.py", line 5, in <module>
    hc= H2OContext(sc).start()
  File "/Users/Documents/sparkling-water-1.5.12/py/dist/pySparkling-1.5.12-py2.7.egg/pysparkling/context.py", line 72, in __init__
  File "/Users/Documents/sparkling-water-1.5.12/py/dist/pySparkling-1.5.12-py2.7.egg/pysparkling/context.py", line 96, in _do_init
  File "/Users/Documents/spark-1.5.2-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 726, in __getattr__
py4j.protocol.Py4JError: Trying to call a package.
16/04/11 16:58:39 INFO SparkContext: Invoking stop() from shutdown hook
16/04/11 16:58:39 INFO SparkUI: Stopped Spark web UI at http://192.168.181.84:4042
16/04/11 16:58:39 INFO DAGScheduler: Stopping DAGScheduler
16/04/11 16:58:39 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/04/11 16:58:39 INFO MemoryStore: MemoryStore cleared
16/04/11 16:58:39 INFO BlockManager: BlockManager stopped
16/04/11 16:58:39 INFO BlockManagerMaster: BlockManagerMaster stopped
16/04/11 16:58:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/04/11 16:58:39 INFO SparkContext: Successfully stopped SparkContext
16/04/11 16:58:39 INFO ShutdownHookManager: Shutdown hook called
16/04/11 16:58:39 INFO ShutdownHookManager: Deleting directory /private/var/fold

如何解决此问题?我完全遵循小册子中的命令:https://h2o-release.s3.amazonaws.com/h2o/rel-turan/3/docs-website/h2o-docs/booklets/SparklingWaterVignette.pdf

1 个答案:

答案 0 :(得分:2)

这实际上是我们在Sparkling Water团队中所知道的一个关键错误,并且已经在其他修补程序的新版本中得到修复。该错误已经修复(https://0xdata.atlassian.net/browse/SW-107),新版本应该很快就会发布。

我会及时通知您,并在新版本发布时告知您。

2016年4月29日编辑

修复程序的新版本已经发布。

对于火花1.6 - http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.6/3/index.html

对于火花1.5 - http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.5/14/index.html

您无需再调用-packages来添加闪闪发光的水核心。 pySparkling egg文件已经包含了它需要的所有必需的Java / Scala类。所以你需要做的只是使用py-files选项设置egg文件,应该是它。