Question

我正在尝试使用zeppelin-0.8.0连接到AWS Glue Development终端节点，并且在执行以下单元格时发生错误。而且没有有用的消息来了解可能是什么问题。任何潜在客户表示赞赏

172318_1906434757 is finished, status: ERROR, exception: java.lang.RuntimeException: org.apache.thrift.TApplicationException: Internal error processing createInterpreter, result: %text org.apache.thrift.TApplicationException: Internal error processing createInterpreter
        at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
        at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_createInterpreter(RemoteInterpreterService.java:209)
        at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.createInterpreter(RemoteInterpreterService.java:192)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:169)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:165)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:165)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:132)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:299)
        at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:407)
        at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
        at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

更新：因此，在下面的answer中，似乎0.8.0尚不能与Glue一起使用。.javax.ws在很好地运行0.7.x时遇到问题.rx包在与Java 8一起运行时具有一堆MethodNotFoundException（这也无助于Java 7的替代更新）。但是，当在JDK 7 docker容器中运行时，它可以正常工作，并且能够连接到我的Dev端点。 非常感谢有人能澄清它的根本原因

Answer 1

请提供更多信息，例如Zeppin实例的位置。它是在您的台式机/笔记本电脑上运行还是作为AWS Notebook服务器运行？您是否还尝试连接到Zeppelin 0.7.3版本，如本AWS论坛链接中所述：

https://forums.aws.amazon.com/thread.jspa?threadID=285128

根据上面日期为2018年7月的链接，认为AWS Glue尚不支持Zeppelin 0.8版本。我假设所有其他配置，环境设置均根据需要完成。如果您可以提供其他信息，则可以提供更多帮助。

更新：无论如何，请参考here和setting up zeppelin on windows，以获取有关设置本地开发环境和齐柏林飞艇笔记本的任何帮助。

一旦设置了齐柏林飞艇笔记本，便建立了SSH连接（使用AWS Glue DevEndpoint URL），因此您可以访问数据目录/爬网程序等，以及数据所在的S3存储桶。然后，您可以在Zeppelin笔记本中创建python脚本，然后从Zeppelin运行。

您可以使用Glue提供的dev实例，但是您可能会为此付出额外的费用（EC2实例费用）。

环境设置（根据评论进行了更新）：

JAVA_HOME=E:\Java7\jre7
Path=E:\Python27;E:\Python27\Lib;E:\Python27\Scripts;
PYTHONPATH=E:\spark-2.1.0-bin-hadoop2.7\python;E:\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip;E:\spark-2.1.0-bin-hadoop2.7\python\lib\pys
park.zip
SPARK_HOME=E:\spark-2.1.0-bin-hadoop2.7

相应地更改驱动器名称/文件夹。让我知道是否需要帮助。

[AWS胶水]：org.apache.thrift.TApplicationException：内部错误处理createInterpreter

1 个答案: