如何在Spark中使用不是线程安全的3rdParty依赖项?

时间:2015-07-07 19:43:55

标签: java apache-spark

如何"线程安全"是Spark吗?我在java中有这样的东西:

class A implements Function<String, Boolean> {
    NotThreadSafe3rdParty calculator = new NotThreadSafe3rdParty();
    public Boolean call(String s) {
        return calculator.calc(s);
    }
}

class B implements Function<String, Boolean> {
    static NotThreadSafe3rdParty calculator;
    static {
        calculator = new NotThreadSafe3rdParty();
    }
    public Boolean call(String s) {
        return calculator.calc(s);
    }
}

class MyRun {
    public static void main(String[] args) {
        String myPath = "/data/path";
        SparkConf conf = new SparkConf().setAppName("Simple Application");
        JavaSparkContext sc = new JavaSparkContext(conf);
        JavaRDD<String> myData = sc.textFile(myPath);

        long numAs = myData.filter(new A()).count();
        long numBs = myData.filter(new B()).count();
    }
}
  1. A类的使用是否正确?
  2. B类的使用是否正确?
  3. 如果A类NotThreadSafe3rdParty是c代码的jni包装器(例如crfsuite?)
  4. 怎么办?
  5. 如何正确使用此类依赖项?

0 个答案:

没有答案