Esper无损事件处理

时间:2018-04-09 17:51:15

标签: complex-event-processing esper

我正在评估Esper作为计费数据无损处理的系统。预计系统每秒可处理约20000个事件,并使用连续聚合运行约400个语句(不将事件存储在内存中)。 为了获得预期的性能,我开始在多个线程中发送事件,并发现esper经常会丢失数据。

显示数据丢失的简单示例

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;

import com.espertech.esper.client.Configuration;
import com.espertech.esper.client.EPAdministrator;
import com.espertech.esper.client.EPRuntime;
import com.espertech.esper.client.EPServiceProvider;
import com.espertech.esper.client.EPServiceProviderManager;
import com.espertech.esper.client.EPStatement;

public class Example {

    public static void main(String[] args) throws Exception {
        new Example().run();
    }

    public void run() throws Exception {
        Configuration config = new Configuration();

        // use default configuration
        EPServiceProvider epService = EPServiceProviderManager.getDefaultProvider(config);

        EPAdministrator epAdministrator = epService.getEPAdministrator();
        // simple schema
        epAdministrator.getConfiguration().addEventType(LogLine.class);

        // event for terminating context partition
        createEPL(epAdministrator, "create schema TerminateEvent() ");

        // Start context partition on LogLine event and terminate on TerminateEvent.
        createEPL(epAdministrator, "create context InitCtx start LogLine end TerminateEvent");

        // select to collect count of events per account_name.
        EPStatement statement = createEPL(epAdministrator, "context InitCtx select context.id as partition_id, count(*), sum(bytes) from LogLine output last when terminated");

        // register listener to output all newEvents properties values
        statement.addListener((newEvents, oldEvents) -> {
            String resultEvents = Arrays.stream(newEvents).map((event) -> {
                return Arrays.stream(event.getEventType().getPropertyNames()).map((prop) -> {
                    return prop + "=" + event.get(prop);
                }).collect(Collectors.joining(", "));
            }).collect(Collectors.joining("]; ["));
            System.out.println("=== results: [" + resultEvents + "]");

        });

        //lets use 4 threads for sending data
        ExecutorService myexecutor = Executors.newFixedThreadPool(4);
        List<CompletableFuture<Void>> listOfTasks = new ArrayList<>();

        //get data to be processed
        List<LogLine> list = getData();
        for (int i = 1; i <= list.size(); i++) {
            //concurrently send each logline
            final LogLine logLine = list.get(i - 1);
            CompletableFuture<Void> task = CompletableFuture.runAsync(() -> {
                epService.getEPRuntime().sendEvent(logLine);
                System.out.println("== sending data " + logLine);
            }, myexecutor);
            listOfTasks.add(task);

            if (i % 4 == 0) {
                // terminate context partition after every 4 events.
                sendTerminateEvent(listOfTasks, epService.getEPRuntime());
            }
        }

        // terminate context partition at the end of the execution.
        sendTerminateEvent(listOfTasks, epService.getEPRuntime());

        // shutdow all services.
        myexecutor.shutdown();
        epService.destroy();
    }

    private void sendTerminateEvent(List<CompletableFuture<Void>> listOfTasks, EPRuntime epRuntime) throws Exception {
        // wait for all submitted tasks to finish
        CompletableFuture[] array = listOfTasks.toArray(new CompletableFuture[listOfTasks.size()]);
        CompletableFuture.allOf(array).get(1, TimeUnit.MINUTES);
        listOfTasks.clear();

        System.out.println("== sending terminate event.");
        // send partition termination event
        epRuntime.sendEvent(Collections.emptyMap(), "TerminateEvent");
    }

    private List<LogLine> getData() {
        List<LogLine> dataEventsList = new ArrayList<>();
        dataEventsList.add(new LogLine(0, 1));
        dataEventsList.add(new LogLine(0, 2));
        dataEventsList.add(new LogLine(0, 3));
        dataEventsList.add(new LogLine(0, 4));
        dataEventsList.add(new LogLine(0, 5));
        dataEventsList.add(new LogLine(1, 1));
        dataEventsList.add(new LogLine(1, 2));
        dataEventsList.add(new LogLine(1, 3));
        dataEventsList.add(new LogLine(1, 4));
        dataEventsList.add(new LogLine(1, 5));
        return dataEventsList;
    }

    private EPStatement createEPL(EPAdministrator admin, String statement) {
        System.out.println("creating EPL: " + statement);
        return admin.createEPL(statement);
    }

    public static class LogLine {
        int account_id;
        int bytes;

        public LogLine(int account_id, int bytes) {
            this.account_id = account_id;
            this.bytes = bytes;
        }

        public int getAccount_id() {
            return account_id;
        }

        public int getBytes() {
            return bytes;
        }

        @Override
        public String toString() {
            return "[account_id=" + account_id + ", bytes=" + bytes + "]";
        }
    }

}

执行输出:

creating EPL: create schema TerminateEvent() 
creating EPL: create context InitCtx start LogLine end TerminateEvent
creating EPL: context InitCtx select context.id as partition_id, count(*), sum(bytes) from LogLine output last when terminated
== data [account_id=0, bytes=3] was send
== data [account_id=0, bytes=1] was send
== data [account_id=0, bytes=4] was send
== data [account_id=0, bytes=2] was send
== sending terminate event.
=== results: [partition_id=0, count(*)=4, sum(bytes)=10]
== data [account_id=1, bytes=2] was send
== data [account_id=1, bytes=3] was send
== data [account_id=0, bytes=5] was send
== data [account_id=1, bytes=1] was send
== sending terminate event.
=== results: [partition_id=1, count(*)=2, sum(bytes)=6]
== data [account_id=1, bytes=5] was send
== data [account_id=1, bytes=4] was send
== sending terminate event.
=== results: [partition_id=2, count(*)=1, sum(bytes)=4]

第一个分区有正确的结果,接下来的两个分区输出无效结果:

// OK
actual   [partition_id=0, count(*)=4, sum(bytes)=10]
expected [partition_id=0, count(*)=4, sum(bytes)=10]

// LOSS
actual   [partition_id=1, count(*)=2, sum(bytes)=6]
expected [partition_id=1, count(*)=4, sum(bytes)=11]

// LOSS
actual   [partition_id=2, count(*)=1, sum(bytes)=4]
expected [partition_id=2, count(*)=2, sum(bytes)=9]

此示例代码有什么问题?

启用优先级执行顺序没有帮助

creating EPL: create schema TerminateEvent() 
creating EPL: @Priority(1) create context InitCtx start LogLine end TerminateEvent
creating EPL: @Priority(0) context InitCtx select context.id as partition_id, count(*), sum(bytes) from LogLine output last when terminated
== data [account_id=0, bytes=3] was send
== data [account_id=0, bytes=4] was send
== data [account_id=0, bytes=1] was send
== data [account_id=0, bytes=2] was send
== sending terminate event.
=== results: [partition_id=0, count(*)=4, sum(bytes)=10]
== data [account_id=1, bytes=2] was send
== data [account_id=1, bytes=3] was send
== data [account_id=0, bytes=5] was send
== data [account_id=1, bytes=1] was send
== sending terminate event.
=== results: [partition_id=1, count(*)=2, sum(bytes)=6]
== data [account_id=1, bytes=5] was send
== data [account_id=1, bytes=4] was send
== sending terminate event.
=== results: [partition_id=2, count(*)=1, sum(bytes)=4]

1 个答案:

答案 0 :(得分:0)

这个问题是Esper data loss when inbound threading is enabled

的更复杂的重复

在Esper EPL需要有序执行的情况下,您必须开发代码,以便以有序的方式处理事件。 Esper无法神奇地执行某些排序。 JVM可以随时暂停任何线程。您必须正确设计代码。

例如,假设您有2个线程。让我们假设A可以并行处理,B必须按照下面例子中提供的顺序进行处理。

假设您有事件进来。您希望B在A1和A2之后但在A3和A4之前处理:

A1 A2 B1 A3 A4

如果您只是将所有A和B事件添加到队列和线程池中,并说5个线程,这意味着可以先处理B,中间或最后处理B.由于JVM不强制执行订单,因此每次运行都可以获得不同的结果。 Esper无法强制执行订单,因为您的应用程序驱动Esper而不是相反。

例如,您可以做的是将第一组A事件添加到队列(A1,A2)。当B进来时,等待队列清空。接下来将B添加到队列中。等待B完成。然后将下一组A事件(A3,A4)添加到队列中。因此,您可以实现与A和B相关的有序处理,并且所有A事件都是并行处理的。

CORRECTION:

我现在看到你只有一个事件类型而没有A + B.在这种情况下,请确保您运行的是最新版本。还要确保“create context”不会获得较低的优先级,否则最后会创建上下文分区。我已经运行了大约10次代码,并且没有看到7.1.0的无效输出。我在JDK 1.8.0_121(Oracle)上。