我想导入数据使用sqoop,但我不想使用shell命令。那么如何使用Java API来做到这一点.Sqoop版本是1.4.6,我使用Scala + SBT来做到这一点。顺便说一句,我需要哪些依赖?
答案 0 :(得分:0)
我需要使用 Sqoop 将数据从 MySQL 导入 Hive ,使用 Scala 在 Cloudera CDH 5.7 集群,所以我开始关注this answer。
问题是它在服务器上执行时没有得到正确的配置。
手动执行Sqoop是这样的:
sqoop import --hive-import --connect "jdbc:mysql://host/db" \
--username "username" --password "password" --table "viewName" \
--hive-table "outputTable" -m 1 --check-column "dateColumnName" \
--last-value "lastMinDate" --incremental append
所以最后我选择使用Scala sys.process.ProcessBuilder
将其作为外部进程执行。以这种方式运行不需要任何SBT依赖。最后,跑步者以这种方式实施:
import sys.process._
def executeSqoop(connectionString: String, username: String, password: String,
viewName: String, outputTable: String,
dateColumnName: String, lastMinDate: String) = {
// To print every single line the process is writing into stdout and stderr respectively
val sqoopLogger = ProcessLogger(
normalLine => log.debug(normalLine),
errorLine => errorLine match {
case line if line.contains("ERROR") => log.error(line)
case line if line.contains("WARN") => log.warning(line)
case line if line.contains("INFO") => log.info(line)
case line => log.debug(line)
}
)
// Create Sqoop command, every parameter and value must be a separated String into the Seq
val command = Seq("sqoop", "import", "--hive-import",
"--connect", connectionString,
"--username", username,
"--password", password,
"--table", viewName,
"--hive-table", outputTable,
"-m", "1",
"--check-column", dateColumnName,
"--last-value", lastMinDate,
"--incremental", "append")
// result will contain the exit code of the command
val result = command ! sqoopLogger
if (result != 0) {
log.error("The Sqoop process did not finished successfully")
} else {
log.info("The Sqoop process finished successfully")
}
}