如何在Java中序列化ExecutorService?

时间:2019-02-27 12:20:48

标签: java serialization apache-flink executorservice

我创建了一个CountMinSketch来计算某些值的最小频率。我正在使用ExecutorService异步更新草图。我在Flink项目上使用此类,因此需要可序列化,因此我正在实现Serializable接口。但是,这还不够,因为ExecutorService还需要可序列化。如何以可序列化方式使用ExecutorService?还是有任何可序列化的ExecutorService实现?

import java.io.Serializable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

public class CountMinSketch implements Serializable {

    private static final long serialVersionUID = 1123747953291780413L;

    private static final int H1 = 0;
    private static final int H2 = 1;
    private static final int H3 = 2;
    private static final int H4 = 3;
    private static final int LIMIT = 100;
    private final int[][] sketch = new int[4][LIMIT];

    final NaiveHashFunction h1 = new NaiveHashFunction(11, 9);
    final NaiveHashFunction h2 = new NaiveHashFunction(17, 15);
    final NaiveHashFunction h3 = new NaiveHashFunction(31, 65);
    final NaiveHashFunction h4 = new NaiveHashFunction(61, 101);

    private ExecutorService executor = Executors.newSingleThreadExecutor();

    public CountMinSketch() {
        // initialize sketch
    }

    public Future<Boolean> updateSketch(String value) {
        return executor.submit(() -> {
            sketch[H1][h1.getHashValue(value)]++;
            sketch[H2][h2.getHashValue(value)]++;
            sketch[H3][h3.getHashValue(value)]++;
            sketch[H4][h4.getHashValue(value)]++;
            return true;
        });
    }

    public Future<Boolean> updateSketch(String value, int count) {
        return executor.submit(() -> {
            sketch[H1][h1.getHashValue(value)] = sketch[H1][h1.getHashValue(value)] + count;
            sketch[H2][h2.getHashValue(value)] = sketch[H2][h2.getHashValue(value)] + count;
            sketch[H3][h3.getHashValue(value)] = sketch[H3][h3.getHashValue(value)] + count;
            sketch[H4][h4.getHashValue(value)] = sketch[H4][h4.getHashValue(value)] + count;
            return true;
        });
    }

    public int getFrequencyFromSketch(String value) {
        int valueH1 = sketch[H1][h1.getHashValue(value)];
        int valueH2 = sketch[H2][h2.getHashValue(value)];
        int valueH3 = sketch[H3][h3.getHashValue(value)];
        int valueH4 = sketch[H4][h4.getHashValue(value)];
        return findMinimum(valueH1, valueH2, valueH3, valueH4);
    }

    private int findMinimum(final int a, final int b, final int c, final int d) {
        return Math.min(Math.min(a, b), Math.min(c, d));
    }
}

import java.io.Serializable;

public class NaiveHashFunction implements Serializable {

    private static final long serialVersionUID = -3460094846654202562L;
    private final static int LIMIT = 100;
    private long prime;
    private long odd;

    public NaiveHashFunction(final long prime, final long odd) {
        this.prime = prime;
        this.odd = odd;
    }

    public int getHashValue(final String value) {
        int hash = value.hashCode();
        if (hash < 0) {
            hash = Math.abs(hash);
        }
        return calculateHash(hash, prime, odd);
    }

    private int calculateHash(final int hash, final long prime, final long odd) {
        return (int) ((((hash % LIMIT) * prime) % LIMIT) * odd) % LIMIT;
    }
}

链接类:

    public static class AverageAggregator implements
            AggregateFunction<Tuple3<Integer, Tuple5<Integer, String, Integer, String, Integer>, Double>, Tuple3<Double, Long, Integer>, Tuple2<String, Double>> {

        private static final long serialVersionUID = 7233937097358437044L;
        private String functionName;
        private CountMinSketch countMinSketch = new CountMinSketch();
.....
}

错误:

Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: The implementation of the AggregateFunction is not serializable. The object probably contains or references non serializable fields.
    at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:99)
    at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.clean(StreamExecutionEnvironment.java:1559)
    at org.apache.flink.streaming.api.datastream.WindowedStream.aggregate(WindowedStream.java:811)
    at org.apache.flink.streaming.api.datastream.WindowedStream.aggregate(WindowedStream.java:730)
    at org.apache.flink.streaming.api.datastream.WindowedStream.aggregate(WindowedStream.java:701)
    at org.sense.flink.examples.stream.MultiSensorMultiStationsReadingMqtt2.<init>(MultiSensorMultiStationsReadingMqtt2.java:39)
    at org.sense.flink.App.main(App.java:141)
Caused by: java.io.NotSerializableException: java.util.concurrent.Executors$FinalizableDelegatedExecutorService
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
    at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:534)
    at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:81)
    ... 6 more

2 个答案:

答案 0 :(得分:3)

ExecutorService包含无法序列化的状态。具体来说,工作线程...以及他们正在处理的任务的状态永远无法使用标准对象序列化类进行序列化。

如果您真的不需要序列化ExecutorService,则可以将引用它的变量标记为transient ...以防止意外序列化它。

可以想象您可以序列化ExecutorService的工作队列。但是序列化执行任务将需要您实现一种自定义机制,以在任务运行时检查其Callable / Runnable...。


如果您尝试将序列化本身作为检查点的一种计算机制,则可能是错误的树。序列化无法捕获线程堆栈上保存的状态。

答案 1 :(得分:0)

您通常不序列化功能组件,仅序列化数据。我确实看不到您要尝试执行的操作,但是如果您使用@Transient注释对ExecutorService字段进行注释,则应该可以解决问题。