Question

我正在开发一个连接到逻辑复制槽的应用程序，以使用WAL事件。然后将这些WAL事件转发到MQ代理。这很好用，但我注意到一段时间后我的内存耗尽了。我设法将问题最小化到负责获取WAL事件的代码。它出现在以下代码中：

final Properties properties = new Properties();

PGProperty.USER.set(properties, "user");
PGProperty.PASSWORD.set(properties, "password");
PGProperty.ASSUME_MIN_SERVER_VERSION.set(properties, "9.4");
PGProperty.REPLICATION.set(properties, "database");
PGProperty.PREFER_QUERY_MODE.set(properties, "simple");

while (true) {
    Connection          connection   = null;
    PGConnection        PGConnection = null;
    PGReplicationStream stream       = null;

    try {
        connection = DriverManager.getConnection("jdbc:postgresql://localhost:5432/db", properties);
        PGConnection = connection.unwrap(PGConnection.class);
        stream = PGConnection.getReplicationAPI().replicationStream().logical().withSlotName("slot").start();

        while (true) {
            final ByteBuffer buffer = stream.read();

            // ... logic here ... (disabled during memory test)

            stream.setAppliedLSN(stream.getLastReceiveLSN());
            stream.setFlushedLSN(stream.getLastReceiveLSN());
        }
    } catch (final SQLException e1) {
        Logger.getLogger(getClass()).error(e1);

        if (stream != null) {
            try {
                stream.close();
            } catch (final SQLException e2) {
                Logger.getLogger(getClass()).error(e2);
            }
        }
        if (connection != null) {
            try {
                connection.close();
            } catch (final SQLException e2) {
                Logger.getLogger(getClass()).error(e2);
            }
        }
    }
}

我注释掉了解析消息并将其转发给MQ代理的逻辑，因为没有这个消息也会发生内存不足。

我还尝试使用轮询方法readPending()而不是阻止方法read()来改变此示例（如https://jdbc.postgresql.org/documentation/head/replication.html所示），但问题仍然存在。

我还注意到，一段时间后，应用程序的CPU使用率达到100％。这必须由底层库引起，因为此时read()仍在正常处理（即，它按顺序处理每个WAL事件）。

在这些测试期间，我正以低费率执行INSERT和UPDATE次查询。

我正在使用以下依赖：

<dependency>
    <groupId>org.postgresql</groupId>
    <artifactId>postgresql</artifactId>
    <version>42.1.4</version>
</dependency>

应用程序在Tomcat8容器中作为WAR运行。

知道发生了什么事吗？

更新1

我弄清楚发生了什么，但到目前为止还无法解释。我会详细介绍。

如上所述，每隔10秒，我会进行INSERT和UPDATE次查询。这些查询导致645 WAL事件。所以每10秒，我必须read() 645个事件。在开始时，这需要0（或有时1）毫秒到read()一个事件。一段时间后，需要1毫秒。然后，再过一段时间，需要2毫秒。等等...

过了一段时间，我无法在10秒内read() 645次事件，因为read()所需的时间不断增加。这解释了100％的CPU使用率和内存不足。

我仍然不确定如何解释以及如何解决这个问题。我会继续调查。

更新2

我尝试在循环结束时添加buffer.clear()，但没有成功。我仍然遇到100％的CPU和内存问题。这是预期的，因为缓冲区是一个局部变量，因此它在每个循环之后以任何方式进行GC编辑。但我认为无论如何都要进行测试是个好主意。

Answer 1

我找到了我内存不足的原因。我正在使用decoderbufs解码输出插件https://github.com/xstevens/decoderbufs进行测试。当使用内置的test插件或wal2json（https://github.com/eulerto/wal2json）进行替换时，我没有遇到这些问题。

我会尝试通知作者decoderbufs插件。

Postgres / JDBC /逻辑复制 - 内存不足问题

1 个答案: