在spark java api中使用filter(),map(),...时出错(org.apache.spark.SparkException)

时间:2017-08-28 12:45:58

标签: java apache-spark cassandra spark-cassandra-connector

我是火花新手 当我在java api中使用spark过滤器时,我得到了这个错误(如果collect()所有表都正常工作,我可以看到所有数据都来自cassandra。)我检查过master和worker版本是一样的当应用程序开始在web ui of spark中我可以看到它但是:

[Stage 0:>                                                          (0 + 0) / 6]
[Stage 0:>                                                          (0 + 2) / 6]
[Stage 0:>                                                          (0 + 4) / 6]
  

2017-08-28 16:37:16,239 ERROR TaskSetManager:70 - 阶段0.0中的任务1失败了4次;堕胎   2017-08-28 16:37:21,351错误DefaultExceptionMapper:170 - 发生意外错误       org.apache.wicket.WicketRuntimeException:方法onRequest接口org.apache.wicket.behavior.IBehaviorListener,目标是   org.apache.wicket.extensions.ajax.markup.html.AjaxLazyLoadPanel$1@e7e7465   在组件[AjaxLazyLoadPanel [Component id = panel]]上扔了一个   例外   org.apache.wicket.RequestListenerInterface.internalInvoke(RequestListenerInterface.java:268)     在   org.apache.wicket.RequestListenerInterface.invoke(RequestListenerInterface.java:241)     在   org.apache.wicket.core.request.handler.ListenerInterfaceRequestHandler.invokeListener(ListenerInterfaceRequestHandler.java:248)     在   org.apache.wicket.core.request.handler.ListenerInterfaceRequestHandler.respond(ListenerInterfaceRequestHandler.java:234)     在   org.apache.wicket.request.cycle.RequestCycle $ HandlerExecutor.respond(RequestCycle.java:895)     在   org.apache.wicket.request.RequestHandlerStack.execute(RequestHandlerStack.java:64)     在   org.apache.wicket.request.cycle.RequestCycle.execute(RequestCycle.java:265)     在   org.apache.wicket.request.cycle.RequestCycle.processRequest(RequestCycle.java:222)     在   org.apache.wicket.request.cycle.RequestCycle.processRequestAndDetach(RequestCycle.java:293)     在   org.apache.wicket.protocol.http.WicketFilter.processRequestCycle(WicketFilter.java:261)     在   org.apache.wicket.protocol.http.WicketFilter.processRequest(WicketFilter.java:203)     在   org.apache.wicket.protocol.http.WicketFilter.doFilter(WicketFilter.java:284)     在   org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)     在   org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)     在   org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:217)     在   org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)     在   org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)     在   org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142)     在   org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)     在   org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)     在   org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)     在   org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518)     在   org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1091)     在   org.apache.coyote.AbstractProtocol $ AbstractConnectionHandler.process(AbstractProtocol.java:673)     在   org.apache.tomcat.util.net.NioEndpoint $ SocketProcessor.doRun(NioEndpoint.java:1500)     在   org.apache.tomcat.util.net.NioEndpoint $ SocketProcessor.run(NioEndpoint.java:1456)     在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:617)     在   org.apache.tomcat.util.threads.TaskThread $ WrappingRunnable.run(TaskThread.java:61)     在java.lang.Thread.run(Thread.java:748)

     

由sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)中的java.lang.reflect.InvocationTargetException引起   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)     在   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.lang.reflect.Method.invoke(Method.java:498)at   org.apache.wicket.RequestListenerInterface.internalInvoke(RequestListenerInterface.java:258)     ... 29更多

     

引起:java.lang.RuntimeException:面板me.SparkTestPanel无法构造。在......

     

引起:org.apache.spark.SparkException:作业因阶段失败而中止:阶段0.0中的任务1失败4次,最近一次   失败:阶段0.0中失去的任务1.3(TID 10,21.1.0.41,执行者1):   java.lang.ClassNotFoundException:me.SparkTestPanel $ 1 at   java.net.URLClassLoader.findClass(URLClassLoader.java:381)at   java.lang.ClassLoader.loadClass(ClassLoader.java:424)at   java.lang.ClassLoader.loadClass(ClassLoader.java:357)at   java.lang.Class.forName0(Native Method)at   java.lang.Class.forName(Class.java:348)at   org.apache.spark.serializer.JavaDeserializationStream $$匿名$ 1.resolveClass(JavaSerializer.scala:67)     在   java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1826)     在   java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)     在   java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000)     在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)     在   java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)     在   java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)     在   java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)     在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)     在   java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)     在   java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)     在   java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)     在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)     在   java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)     在   java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)     在   java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)     在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)     在   java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)     在   java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)     在   java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)     在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)     在   org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)     在   org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)     在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)     在org.apache.spark.scheduler.Task.run(Task.scala:99)at   org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:322)     在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:624)     在java.lang.Thread.run(Thread.java:748)

     

驱动程序堆栈跟踪:at org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ failJobAndIndependentStages(DAGScheduler.scala:1435)     在   org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.适用(DAGScheduler.scala:1423)     在   org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.适用(DAGScheduler.scala:1422)     在   scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala:59)     在scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)     在   org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)     在   org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.适用(DAGScheduler.scala:802)     在   org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.适用(DAGScheduler.scala:802)     在scala.Option.foreach(Option.scala:257)at   org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)     在   org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)     在   org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)     在   org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)     在org.apache.spark.util.EventLoop $$ anon $ 1.run(EventLoop.scala:48)     在   org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)     在org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)at at   org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)at at   org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)at at   org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)at at   org.apache.spark.rdd.RDD.count(RDD.scala:1158)at   org.apache.spark.api.java.JavaRDDLike $ class.count(JavaRDDLike.scala:455)     在   org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45)     在me.SparkTestPanel。(SparkTestPanel.java:77)at   sun.reflect.NativeConstructorAccessorImpl.newInstance0(本机方法)     在   sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)     在   sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)     ... 39更多

     

引起:java.lang.ClassNotFoundException:java.net.URLClassLoader.findClass(URLClassLoader.java:381)中的me.SparkTestPanel $ 1   java.lang.ClassLoader.loadClass(ClassLoader.java:424)at   java.lang.ClassLoader.loadClass(ClassLoader.java:357)at   java.lang.Class.forName0(Native Method)at   java.lang.Class.forName(Class.java:348)at   org.apache.spark.serializer.JavaDeserializationStream $$匿名$ 1.resolveClass(JavaSerializer.scala:67)     在   java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1826)     在   java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)     在   java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000)     在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)     在   java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)     在   java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)     在   java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)     在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)     在   java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)     在   java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)     在   java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)     在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)     在   java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)     在   java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)     在   java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)     在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)     在   java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)     在   java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)     在   java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)     在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)     在   org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)     在   org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)     在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)     在org.apache.spark.scheduler.Task.run(Task.scala:99)at   org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:322)     在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:624)     ......还有1个

我的代码是:

import com.datastax.spark.connector.japi.CassandraJavaUtil;
import static com.datastax.spark.connector.japi.CassandraJavaUtil.javaFunctions;
import static com.datastax.spark.connector.japi.CassandraJavaUtil.mapColumnTo;
import com.datastax.spark.connector.japi.CassandraRow;
import com.datastax.spark.connector.japi.rdd.CassandraTableScanJavaRDD;

import java.util.List;
import org.apache.log4j.Logger;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;

import org.apache.wicket.markup.html.form.Form;

/**
 *
 * @author mohamadreza
 */
public class SparkTestPanel extends Panel {

    private Form form;

    public SparkTestPanel(String id) {
        super(id);
        form = new Form("form");
        form.setOutputMarkupId(true);
        this.add(form);             
        SparkConf conf = new SparkConf(true);
        conf.setAppName("Spark Test");
        conf.setMaster("spark://192.16.11.18:7049");
        conf.set("spark.closure.serializer","org.apache.spark.serializer.JavaSerializer");
        conf.set("spark.serializer","org.apache.spark.serializer.JavaSerializer");

        conf.set("spark.cassandra.connection.host", "192.16.11.18");
        conf.set("spark.cassandra.connection.port", "7005");
        conf.set("spark.cassandra.auth.username", "user");
        conf.set("spark.cassandra.auth.password", "password");
        JavaSparkContext sc = null;
        try {
            sc = new JavaSparkContext(conf);
            JavaRDD<CassandraRow> cache = javaFunctions(sc).cassandraTable("keyspace", "test").cache();
            Long count = cache.filter(new Function<CassandraRow, Boolean>() {
                @Override
                public Boolean call(CassandraRow t1) throws Exception {
                    return t1.getString("value").contains("test");
                }
            }).count();
            String a = count.toString();
        } finally {
            sc.stop();
        }
    }
}

和spark版本2.1.1,scala版本2.11,JAVA 8和我的pom.xml:

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.1.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.1.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>2.1.1</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>com.datastax.spark</groupId>
        <artifactId>spark-cassandra-connector_2.11</artifactId>
        <version>2.0.5</version>
    </dependency>

我使用docker for cassandra和spark nodes。(cassandra 3.0版) 任何人都可以帮助我吗?

1 个答案:

答案 0 :(得分:1)

问题解决了:)

如果要使用Apache Spark的JAVA Api,则必须将项目根目录中的.jar(位于项目根目录中的目标目录中)复制到每个Spark节点(master和workers)中的$SPARK_PATH/jars/。如果你的.jar非常大,你可以拆分ui和spark代码,只复制.jar的spark代码项目,并在你的ui项目中使用这个spark代码。