我正在尝试通过Kafka向Hive汇总数据 - > Flink - > Hive使用以下代码段:
但我收到以下错误:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<GenericRecord> stream = readFromKafka(env);
private static final TypeInformation[] FIELD_TYPES = new TypeInformation[]{
BasicTypeInfo.INT_TYPE_INFO, BasicTypeInfo.STRING_TYPE_INFO
};
JDBCAppendTableSink sink = JDBCAppendTableSink.builder()
.setDrivername("org.apache.hive.jdbc.HiveDriver")
.setDBUrl("jdbc:hive2://hiveconnstring")
.setUsername("myuser")
.setPassword("mypass")
.setQuery("INSERT INTO testHiveDriverTable (key,value) VALUES (?,?)")
.setBatchSize(1000)
.setParameterTypes(FIELD_TYPES)
.build();
DataStream<Row> rows = stream.map((MapFunction<GenericRecord, Row>) st1 -> {
Row row = new Row(2); //
row.setField(0, st1.get("SOME_ID"));
row.setField(1, st1.get("SOME_ADDRESS"));
return row;
});
sink.emitDataStream(rows);
env.execute("Flink101");
Caused by: java.lang.RuntimeException: Execution of JDBC statement failed.
at org.apache.flink.api.java.io.jdbc.JDBCOutputFormat.flush(JDBCOutputFormat.java:219)
at org.apache.flink.api.java.io.jdbc.JDBCSinkFunction.snapshotState(JDBCSinkFunction.java:43)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:90)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:356)
... 12 more
Caused by: java.sql.SQLException: Method not supported
at org.apache.hive.jdbc.HiveStatement.executeBatch(HiveStatement.java:381)
at org.apache.flink.api.java.io.jdbc.JDBCOutputFormat.flush(JDBCOutputFormat.java:216)
... 17 more
我检查了hive-jdbc驱动程序,似乎hive-jdbc驱动程序中不支持该方法。
public class HiveStatement implements java.sql.Statement {
...
@Override
public int[] executeBatch() throws SQLException {
throw new SQLFeatureNotSupportedException("Method not supported");
}
..
}
我们有什么办法可以使用JDBC Driver实现这个目标吗?
让我知道,
提前致谢。
答案 0 :(得分:1)
Hive的JDBC实现尚未完成。您的问题由this issue跟踪。
您可以尝试修补Flink的JDBCOutputFormat
,以便在upload.addBatch
中将upload.execute
替换为JDBCOutputFormat.java:202
,并在upload.executeBatch
中移除对JDBCOutputFormat.java:216
的调用,从而不使用批处理1}}。不利的一面是,您为每条记录发出一个专门的SQL查询,这可能会减慢速度。