如何使用Sqoop java api导入数据?

时间:2016-06-27 07:50:27

标签: scala sqoop

我想导入数据使用sqoop,但我不想使用shell命令。那么如何使用Java API来做到这一点.Sqoop版本是1.4.6,我使用Scala + SBT来做到这一点。顺便说一句,我需要哪些依赖?

1 个答案:

答案 0 :(得分:0)

我需要使用 Sqoop 将数据从 MySQL 导入 Hive ,使用 Scala Cloudera CDH 5.7 集群,所以我开始关注this answer

问题是它在服务器上执行时没有得到正确的配置。

手动执行Sqoop是这样的:

sqoop import --hive-import --connect "jdbc:mysql://host/db" \
--username "username" --password "password" --table "viewName" \
--hive-table "outputTable" -m 1 --check-column "dateColumnName" \
--last-value "lastMinDate" --incremental append

所以最后我选择使用Scala sys.process.ProcessBuilder将其作为外部进程执行。以这种方式运行不需要任何SBT依赖。最后,跑步者以这种方式实施:

import sys.process._

def executeSqoop(connectionString: String, username: String, password: String, 
                   viewName: String, outputTable: String, 
                   dateColumnName: String, lastMinDate: String) = { 
  // To print every single line the process is writing into stdout and stderr respectively
  val sqoopLogger = ProcessLogger(
    normalLine => log.debug(normalLine),
    errorLine => errorLine match {
      case line if line.contains("ERROR") => log.error(line)
      case line if line.contains("WARN") => log.warning(line)
      case line if line.contains("INFO") => log.info(line)
      case line => log.debug(line)
    }
  )

  // Create Sqoop command, every parameter and value must be a separated String into the Seq
  val command = Seq("sqoop", "import", "--hive-import",
    "--connect", connectionString,
    "--username", username,
    "--password", password,
    "--table", viewName,
    "--hive-table", outputTable,
    "-m", "1",
    "--check-column", dateColumnName,
    "--last-value", lastMinDate,
    "--incremental", "append")

  // result will contain the exit code of the command
  val result = command ! sqoopLogger
  if (result != 0) {
    log.error("The Sqoop process did not finished successfully")
  } else {
    log.info("The Sqoop process finished successfully")
  }
}