使用java

时间:2015-06-19 12:51:20

标签: java cassandra apache-spark spark-cassandra-connector

这是我的代码,只需使用spark cassandra连接器读取列族

import static com.datastax.spark.connector.japi.CassandraJavaUtil.*;

import com.datastax.spark.connector.japi.SparkContextJavaFunctions;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.api.java.JavaRDD;

public class Main {
    private static final String HOST = "spark://sparkmaster:7077";
//    private static final String HOST = "local[4]";

    private static final String APP_NAME = "Cassandra Spark WordCount";

    public static void main (String... args) {
        String[] jars = {
                "./build/libs/CassandraSparkMapReduce-1.0-SNAPSHOT.jar"
        };

        SparkConf conf = new SparkConf(true)
                .set("spark.cassandra.connection.host", "107.108.214.154")
                .set("spark.executor.userClassPathFirst", "true")
                .setJars(jars);

        SparkContext sc = new SparkContext(HOST, APP_NAME, conf);
        SparkContextJavaFunctions context = javaFunctions(sc);

        JavaRDD<String> rdd = context.cassandraTable("wordcount", "input")
                .map(row -> row.toString());

        System.out.println(rdd.toArray());
    }
}

这是我的build.gradle文件来构建和运行应用程序

group 'in.suyash.tests'
version '1.0-SNAPSHOT'

apply plugin: 'java'
apply plugin: 'application'

sourceCompatibility = 1.8

repositories {
    mavenCentral()
}

dependencies {
    compile group: 'org.apache.spark', name: 'spark-core_2.10', version: '1.4.0'

    compile group: 'com.datastax.spark', name: 'spark-cassandra-connector_2.10', version: '1.4.0-M1'
    compile group: 'com.datastax.spark', name: 'spark-cassandra-connector-java_2.10', version: '1.4.0-M1'

    testCompile group: 'junit', name: 'junit', version: '4.11'
}

sourceSets {
    main {
        java {
            srcDir './'
        }
    }
}

mainClassName = 'Main'

// http://stackoverflow.com/a/14441628/3673043
jar {
    doFirst {
        from {
            configurations.compile.collect {
                it.isDirectory() ? it : zipTree(it)
            }
        }
    }
    exclude 'META-INF/*.RSA', 'META-INF/*.SF','META-INF/*.DSA'
}

我首先执行gradle build来构建jar,然后执行gradle run来执行我的工作。但是这个工作失败了,看着执行器中的stderr,我得到了以下异常

java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask cannot be cast to org.apache.spark.scheduler.Task
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

我有一个3节点设置,其中一个节点充当火花主节点,而其他两个节点充当火花工作节点,并且还形成一个cassandra环。我可以在本地执行作业,如果我更改我的火花主机,但在群集上我得到这个奇怪的异常,我在其他地方找不到。版本:

  • Spark:1.4.0
  • Cassandra:2.1.6
  • spark-cassandra-connector:1.4.0-M1


修改

我无法准确回答我是如何解决这个问题的,但是我从所有节点中删除了所有java安装,重新启动了所有内容并安装了jdk1.8.0_45的全新副本,再次启动了我的集群,现在作业成功完成。对此行为的任何解释都是受欢迎的。

0 个答案:

没有答案