我是火花新手 当我在java api中使用spark过滤器时,我得到了这个错误(如果collect()所有表都正常工作,我可以看到所有数据都来自cassandra。)我检查过master和worker版本是一样的当应用程序开始在web ui of spark中我可以看到它但是:
[Stage 0:> (0 + 0) / 6]
[Stage 0:> (0 + 2) / 6]
[Stage 0:> (0 + 4) / 6]
2017-08-28 16:37:16,239 ERROR TaskSetManager:70 - 阶段0.0中的任务1失败了4次;堕胎 2017-08-28 16:37:21,351错误DefaultExceptionMapper:170 - 发生意外错误 org.apache.wicket.WicketRuntimeException:方法onRequest接口org.apache.wicket.behavior.IBehaviorListener,目标是 org.apache.wicket.extensions.ajax.markup.html.AjaxLazyLoadPanel$1@e7e7465 在组件[AjaxLazyLoadPanel [Component id = panel]]上扔了一个 例外 org.apache.wicket.RequestListenerInterface.internalInvoke(RequestListenerInterface.java:268) 在 org.apache.wicket.RequestListenerInterface.invoke(RequestListenerInterface.java:241) 在 org.apache.wicket.core.request.handler.ListenerInterfaceRequestHandler.invokeListener(ListenerInterfaceRequestHandler.java:248) 在 org.apache.wicket.core.request.handler.ListenerInterfaceRequestHandler.respond(ListenerInterfaceRequestHandler.java:234) 在 org.apache.wicket.request.cycle.RequestCycle $ HandlerExecutor.respond(RequestCycle.java:895) 在 org.apache.wicket.request.RequestHandlerStack.execute(RequestHandlerStack.java:64) 在 org.apache.wicket.request.cycle.RequestCycle.execute(RequestCycle.java:265) 在 org.apache.wicket.request.cycle.RequestCycle.processRequest(RequestCycle.java:222) 在 org.apache.wicket.request.cycle.RequestCycle.processRequestAndDetach(RequestCycle.java:293) 在 org.apache.wicket.protocol.http.WicketFilter.processRequestCycle(WicketFilter.java:261) 在 org.apache.wicket.protocol.http.WicketFilter.processRequest(WicketFilter.java:203) 在 org.apache.wicket.protocol.http.WicketFilter.doFilter(WicketFilter.java:284) 在 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) 在 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) 在 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:217) 在 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106) 在 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502) 在 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142) 在 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) 在 org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616) 在 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88) 在 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518) 在 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1091) 在 org.apache.coyote.AbstractProtocol $ AbstractConnectionHandler.process(AbstractProtocol.java:673) 在 org.apache.tomcat.util.net.NioEndpoint $ SocketProcessor.doRun(NioEndpoint.java:1500) 在 org.apache.tomcat.util.net.NioEndpoint $ SocketProcessor.run(NioEndpoint.java:1456) 在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 在 java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:617) 在 org.apache.tomcat.util.threads.TaskThread $ WrappingRunnable.run(TaskThread.java:61) 在java.lang.Thread.run(Thread.java:748)
由sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)中的java.lang.reflect.InvocationTargetException引起 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 在java.lang.reflect.Method.invoke(Method.java:498)at org.apache.wicket.RequestListenerInterface.internalInvoke(RequestListenerInterface.java:258) ... 29更多
引起:java.lang.RuntimeException:面板me.SparkTestPanel无法构造。在......
引起:org.apache.spark.SparkException:作业因阶段失败而中止:阶段0.0中的任务1失败4次,最近一次 失败:阶段0.0中失去的任务1.3(TID 10,21.1.0.41,执行者1): java.lang.ClassNotFoundException:me.SparkTestPanel $ 1 at java.net.URLClassLoader.findClass(URLClassLoader.java:381)at java.lang.ClassLoader.loadClass(ClassLoader.java:424)at java.lang.ClassLoader.loadClass(ClassLoader.java:357)at java.lang.Class.forName0(Native Method)at java.lang.Class.forName(Class.java:348)at org.apache.spark.serializer.JavaDeserializationStream $$匿名$ 1.resolveClass(JavaSerializer.scala:67) 在 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1826) 在 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713) 在 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000) 在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) 在 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) 在 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) 在 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) 在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) 在 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) 在 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) 在 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) 在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) 在 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) 在 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) 在 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) 在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) 在 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) 在 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) 在 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) 在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) 在 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) 在 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) 在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80) 在org.apache.spark.scheduler.Task.run(Task.scala:99)at org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:322) 在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 在 java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:624) 在java.lang.Thread.run(Thread.java:748)
驱动程序堆栈跟踪:at org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ failJobAndIndependentStages(DAGScheduler.scala:1435) 在 org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.适用(DAGScheduler.scala:1423) 在 org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.适用(DAGScheduler.scala:1422) 在 scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala:59) 在scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) 在 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422) 在 org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.适用(DAGScheduler.scala:802) 在 org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.适用(DAGScheduler.scala:802) 在scala.Option.foreach(Option.scala:257)at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802) 在 org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650) 在 org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605) 在 org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594) 在org.apache.spark.util.EventLoop $$ anon $ 1.run(EventLoop.scala:48) 在 org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) 在org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)at at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)at at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)at at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)at at org.apache.spark.rdd.RDD.count(RDD.scala:1158)at org.apache.spark.api.java.JavaRDDLike $ class.count(JavaRDDLike.scala:455) 在 org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45) 在me.SparkTestPanel。(SparkTestPanel.java:77)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(本机方法) 在 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 在 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ... 39更多
引起:java.lang.ClassNotFoundException:java.net.URLClassLoader.findClass(URLClassLoader.java:381)中的me.SparkTestPanel $ 1 java.lang.ClassLoader.loadClass(ClassLoader.java:424)at java.lang.ClassLoader.loadClass(ClassLoader.java:357)at java.lang.Class.forName0(Native Method)at java.lang.Class.forName(Class.java:348)at org.apache.spark.serializer.JavaDeserializationStream $$匿名$ 1.resolveClass(JavaSerializer.scala:67) 在 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1826) 在 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713) 在 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000) 在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) 在 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) 在 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) 在 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) 在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) 在 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) 在 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) 在 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) 在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) 在 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) 在 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) 在 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) 在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) 在 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) 在 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) 在 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) 在java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) 在 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) 在 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) 在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80) 在org.apache.spark.scheduler.Task.run(Task.scala:99)at org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:322) 在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 在 java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:624) ......还有1个
我的代码是:
import com.datastax.spark.connector.japi.CassandraJavaUtil;
import static com.datastax.spark.connector.japi.CassandraJavaUtil.javaFunctions;
import static com.datastax.spark.connector.japi.CassandraJavaUtil.mapColumnTo;
import com.datastax.spark.connector.japi.CassandraRow;
import com.datastax.spark.connector.japi.rdd.CassandraTableScanJavaRDD;
import java.util.List;
import org.apache.log4j.Logger;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.wicket.markup.html.form.Form;
/**
*
* @author mohamadreza
*/
public class SparkTestPanel extends Panel {
private Form form;
public SparkTestPanel(String id) {
super(id);
form = new Form("form");
form.setOutputMarkupId(true);
this.add(form);
SparkConf conf = new SparkConf(true);
conf.setAppName("Spark Test");
conf.setMaster("spark://192.16.11.18:7049");
conf.set("spark.closure.serializer","org.apache.spark.serializer.JavaSerializer");
conf.set("spark.serializer","org.apache.spark.serializer.JavaSerializer");
conf.set("spark.cassandra.connection.host", "192.16.11.18");
conf.set("spark.cassandra.connection.port", "7005");
conf.set("spark.cassandra.auth.username", "user");
conf.set("spark.cassandra.auth.password", "password");
JavaSparkContext sc = null;
try {
sc = new JavaSparkContext(conf);
JavaRDD<CassandraRow> cache = javaFunctions(sc).cassandraTable("keyspace", "test").cache();
Long count = cache.filter(new Function<CassandraRow, Boolean>() {
@Override
public Boolean call(CassandraRow t1) throws Exception {
return t1.getString("value").contains("test");
}
}).count();
String a = count.toString();
} finally {
sc.stop();
}
}
}
和spark版本2.1.1,scala版本2.11,JAVA 8和我的pom.xml:
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.11</artifactId>
<version>2.0.5</version>
</dependency>
我使用docker for cassandra和spark nodes。(cassandra 3.0版) 任何人都可以帮助我吗?
答案 0 :(得分:1)
问题解决了:)
如果要使用Apache Spark的JAVA Api,则必须将项目根目录中的.jar
(位于项目根目录中的目标目录中)复制到每个Spark节点(master和workers)中的$SPARK_PATH/jars/
。如果你的.jar
非常大,你可以拆分ui和spark代码,只复制.jar
的spark代码项目,并在你的ui项目中使用这个spark代码。