当我使用java api将mysql的数据导入hdfs时,我有一些疑问。这不是错误。 有我的演示
// RDBMS link
MLink rdbmsLink = client.createLink("generic-jdbc-connector");
MConfigList configs = rdbmsLink.getConnectorLinkConfig();
configs.getStringInput("linkConfig.jdbcDriver").setValue("com.mysql.jdbc.Driver");
configs.getStringInput("linkConfig.connectionString").setValue("jdbc:mysql://127.0.0.1:3306/sqoop_test");
configs.getStringInput("linkConfig.username").setValue("root");
configs.getStringInput("linkConfig.password").setValue("123456789");
rdbmsLink.getConnectorLinkConfig("dialect").getStringInput("dialect.identifierEnclose").setValue(" ");
rdbmsLink.setName("mysql-append-link");
Status fromStatus = client.saveLink(rdbmsLink);
链接很常见。
MLink hdfsLink = client.createLink("hdfs-connector");
hdfsLink.setName("hdfs-append-link");
hdfsLink.setCreationUser("root");
MLinkConfig toLinkConfig = hdfsLink.getConnectorLinkConfig();
toLinkConfig.getStringInput("linkConfig.uri").setValue("hdfs://cdh:9000/");
client.saveLink(hdfsLink);
此链接也很常见。 重要的代码来了。
MConfigList jobConfig = job.getFromJobConfig();
jobConfig.getStringInput("fromJobConfig.sql").setValue("SELECT a.`jobid`,a.`userid`,a.`jobname`,a.`joblink`,a.`jobdate` ,b.`username` FROM `job_msg` as a LEFT JOIN `user_msg` as b ON a.`userid` = b.`userid` WHERE ${CONDITIONS}");
jobConfig.getStringInput("fromJobConfig.partitionColumn").setValue("jobdate");
jobConfig.getStringInput("incrementalRead.checkColumn").setValue("jobdate");
jobConfig.getStringInput("incrementalRead.lastValue").setValue("2018-08-09 00:11:11");
我放了fromJobConfig.sql
,它使用了左连接。
当我开始工作时,它可以工作,但是当我看到sqoop2-server日志时,发现了一个问题。
New maximal value for incremental import is 2019-03-13 17:00:13.0
使用最小/最大查询:SELECT MIN(“ jobdate”),MAX(“ jobdate”)FROM(SELECT a。jobid
,a。userid
,a。jobname
, a。joblink
,a。jobdate
,b。username
来自job_msg
的左联接user_msg
as b上的a。userid
= b 。userid
哪里1 = 1)SQOOP_SUBQUERY_ALIAS在哪里“工作日期”>? AND“ jobdate” <=?
我们可以看到sql是不合理的。条件(“ jobdate”>?AND“ jobdate” <=?)在子查询之外。如果有很多数据行,我认为它会很慢。我看到了Sqoop1.99.7的源代码
sb.setLength(0);
sb.append("SELECT ");
sb.append("MAX(").append(executor.encloseIdentifier(jobConf.incrementalRead.checkColumn)).append(") ");
sb.append("FROM ");
sb.append(fromFragment);
String incrementalNewMaxValueQuery = sb.toString();
LOG.info("Incremental new max value query: " + incrementalNewMaxValueQuery);
try (
PreparedStatement columnTypeStatement = executor.prepareStatement("SELECT " + executor.encloseIdentifier(jobConf.incrementalRead.checkColumn) + " FROM " + fromFragment + " WHERE 1 = 2");
ResultSet columnTypeResultSet = columnTypeStatement.executeQuery();
Statement statement = executor.createStatement();
ResultSet rs = statement.executeQuery(incrementalNewMaxValueQuery)
) {
ResultSetMetaData checkColumnMetaData = columnTypeResultSet.getMetaData();
checkColumnScale = checkColumnMetaData.getScale(1);
checkColumnType = checkColumnMetaData.getColumnType(1);
if (!rs.next()) {
throw new SqoopException(GenericJdbcConnectorError.GENERIC_JDBC_CONNECTOR_0022);
}
incrementalNewMaxValueQuery是搜索sql。 现在我不知道这是我的sqoop2'job演示错误还是有一些改进的地方。
谢谢