我正在构建一个数据同步器,该同步器捕获来自MySQL Source的数据更改,并将数据导出到配置单元。
我选择使用Kafka Connect来实现。我将Debezium用作源连接器,并将confluent hdfs用作接收器连接器。
但是问题是,Debezium对于Kafka主题的命名约定类似于:
serverName.databaseName.tableName
在融合的hdfs下沉属性中,我必须将topics
配置为与生成的Debezium相同:
"topics": "serverName.databaseName.tableName"
汇合的hdfs接收器连接器将在HDFS中生成路径,例如:
/topics/serverName.databaseName.tableName/partition=0
这肯定会在HDFS / Hive中引起一些问题,因为该路径包含语法.
。实际上,由于路径问题,由融合的hdfs接收器连接器自动生成的外部表失败。
2020-05-08T00:42:02,717 ERROR [pool-6-thread-31] metastore.RetryingHMSHandler: MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://localhost:9000./null/topics/dbserver1.test_data_1.student1)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6935)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:2050)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy26.create_table_with_environment_context(Unknown Source)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:14800)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:14784)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://localhost:9000./null/topics/dbserver1.test_data_1.student1
at org.apache.hadoop.fs.Path.initialize(Path.java:263)
at org.apache.hadoop.fs.Path.<init>(Path.java:254)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:143)
at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:147)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1852)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1786)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:2035)
... 20 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: hdfs://localhost:9000./null/topics/dbserver1.test_data_1.student1
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.<init>(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:260)
... 26 more
那么无论如何,我是否可以更改主题的Debezium默认命名约定,或者可以更改通过主题名称生成的融合hdfs接收器连接器的默认路径?
答案 0 :(得分:0)
HDFS连接器将replace dots (and dashes) with underscores when creating Hive tables
HDFS本身并不关心路径中的点。问题是端口后不能有点,并且在其中有/null
。
hdfs://localhost:9000./null
无论如何,我可以更改主题的Debezium默认命名约定
解决方案与Debezium无关。您可以在RegexRouter
配置中使用基于Apache Kafka Connect库的transforms
作为源连接器或接收器连接器,具体取决于您要“解决”问题的时间。
您还可以编写自己的转换并将其放在Connect的plugin.path