Kafka connect jdbc源mssql服务器加载数百万条记录而导致内存不足错误

时间:2019-07-07 02:47:18

标签: apache-kafka apache-kafka-connect

我试图通过Kafka connect JDBC源从MSSQL服务器向Kafka主题加载7700万条记录。

给定batch.max.rows为1000的尝试批处理方法。在这种情况下,经过1000条记录后,它遍及整个内存。请分享有关使其运作的建议

下面是我尝试的连接器方法

curl -X POST http://test.com:8083/connectors -H "Content-Type: application/json" -d '{
        "name": "mssql_jdbc_rsitem_pollx",
        "config": {
                "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
                "connection.url": "jdbc:sqlserver://test:1433;databaseName=xxx",
                "connection.user": "xxxx",
                "connection.password": "xxxx",
                "topic.prefix": "mssql-rsitem_pollx-",
                 "mode":"incrementing",
                 "table.whitelist" : "test",
                "timestamp.column.name": "itemid",
         "max.poll.records" :"100",
                "max.poll.interval.ms":"3000",
                "validate.non.null": false
        }
        }'
curl -X POST http://test.com:8083/connectors -H "Content-Type: application/json" -d '{
        "name": "mssql_jdbc_test_polly",
        "config": {
                "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
                "tasks.max": "10",
                "connection.url": "jdbc:sqlserver://test:1433;databaseName=xxx;defaultFetchSize=10000;useCursorFetch=true",
                "connection.user": "xxxx",
                "connection.password": "xxxx",
                "topic.prefix": "mssql-rsitem_polly-",
                 "mode":"incrementing",
                 "table.whitelist" : "test",
                "timestamp.column.name": "itemid",
                "poll.interval.ms":"86400000",
                "validate.non.null": false
        }
        }'

1 个答案:

答案 0 :(得分:0)

尝试增加Java堆大小,请在命令行中编写:

export KAFKA_HEAP_OPTS="-Xms1g -Xmx2g"

您可以更改“ Xmx2g”部分以匹配您的容量。