使用Python的Spark Streaming 1.6.0 EMR:ClassNotFoundException:org.apache.spark.streaming.kinesis.KinesisUtilsPythonHelper

时间:2016-03-24 20:48:35

标签: apache-spark pyspark spark-streaming amazon-emr amazon-kinesis

我在AWS上使用Spark 1.6.0和Zeppelin 0.5.6运行一个开箱即用的EMR集群。我的目标是初始化一个简单的Spark Streaming上下文,并指向内部Kinesis流,就像一个概念验证。但是,当我运行它时,我得到:

public class GUI
{
    ReversiButton[][] reversi = new ReversiButton[8][8];
    JFrame WhiteFrame = new JFrame();
    JFrame BlackFrame = new JFrame();
    JLabel WhiteLabel = new JLabel();
    JLabel BlackLabel = new JLabel();
    JPanel BlackGrid = new JPanel();
    JPanel WhiteGrid = new JPanel();
    JButton WhiteButton = new JButton();
    JButton BlackButton = new JButton();
    public GUI()
    {
        populateArray();
        initGUI();
    }
    private void populateArray()
    {
        for(int y = 0;y<8;y++)
        {
            for(int x = 0; x<8;x++)
            {
                reversi[x][y] = new ReversiButton();
            }
        }
    }
    private void initGUI()
    {
        WhiteFrame.setTitle("Reversi White Player");
        BlackFrame.setTitle("Reversi Black Player");
        WhiteFrame.setLayout(new BorderLayout());
        WhiteLabel.setText("White Player - click place to put piece");
        WhiteGrid.setLayout(new GridLayout(8,8));
        for(int wy = 0;wy<8;wy++)
        {
            for(int wx = 0; wx<8;wx++)
            {
                WhiteGrid.add(reversi[wx][wy]);
            }
        }
        WhiteButton.setText("Greedy AI(play white)");
        WhiteFrame.add(BorderLayout.NORTH,WhiteLabel);
        WhiteFrame.add(BorderLayout.CENTER,WhiteGrid);
        WhiteFrame.add(BorderLayout.SOUTH,WhiteButton);
        WhiteFrame.pack();
        WhiteFrame.setVisible(true);
        BlackFrame.setLayout(new BorderLayout());
        BlackLabel.setText("Black player - not your turn");
        BlackGrid.setLayout(new GridLayout(8,8));
        for(int y = 0; y<8; y++)
        {
            for(int x = 0; x<8; x++)
            {
                BlackGrid.add(reversi[x][y]);
            }
        }
        BlackButton.setText("Greedy AI(play black)");
        BlackFrame.add(BorderLayout.NORTH, BlackLabel);
        BlackFrame.add(BorderLayout.CENTER, BlackGrid);
        BlackFrame.add(BorderLayout.SOUTH,BlackButton);
        BlackFrame.pack();
        BlackFrame.setVisible(true);
    }
}

我正在运行的代码(通过Zeppelin)只是:

Py4JJavaError: An error occurred while calling o89.loadClass. : 
java.lang.ClassNotFoundException: org.apache.spark.streaming.kinesis.KinesisUtilsPythonHelper
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
    at py4j.Gateway.invoke(Gateway.java:259)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:209)
    at java.lang.Thread.run(Thread.java:745)

当我在本地遇到这个时,我确保从源代码构建spark-streaming-kinesis-asl并在我的spark配置中包含这些jar:

%pyspark
from pyspark.streaming import StreamingContext
from pyspark.streaming.kinesis import KinesisUtils, InitialPositionInStream

ssc = StreamingContext(sc, 1)

appName = '{my-app-name}'
streamName = '{my-stream-name}'
endpointUrl = '{my-endpoint}'
regionName = '{my-region}'

lines = KinesisUtils.createStream(ssc, appName, streamName, endpointUrl, regionName, InitialPositionInStream.LATEST, 2)

然而,在EMR上我似乎无法使其工作。为了安全起见,我将其包含在下面,无济于事:

spark.driver.extraClassPath /path/to/kinesis/asl/assembly/jars/*

有没有人遇到过这个?当我重新启动上下文以确认正在拾取这些更改时,我正在打印出spark配置。也许这也需要在从节点上完成?或者可能是另一个配置选项/密钥?

1 个答案:

答案 0 :(得分:2)

将依赖项添加到zeppelin context&#34; z&#34;。下面是一个添加sparkcsv包的例子

%dep
z.load("com.databricks:spark-csv_2.11:1.3.0")