Sqoop的增量类型lastmodified不进行增量导入

时间:2012-10-09 07:23:56

标签: import mapreduce hdfs sqoop

我正在使用Sqoop v1.4.2对作业进行增量导入。工作是:
--create job_1 -- import --connect <CONNECT_STRING> --username <UNAME> --password <PASSWORD> -m <MAPPER#> --split-by <COLUMN> --target-dir <TARGET_DIR> --table <TABLE> --check-column <COLUMN> --incremental append --last-value 1

注意:

  1. 增量类型附加
  2. 创建工作成功
  3. 重复执行作业成功
  4. 可以看到正在HDFS中导入的新行
  5. --create job_2 -- import --connect <CONNECT_STRING> --username <UNAME> --password <PASSWORD> -m <MAPPER#> --split-by <COLUMN> --target-dir <TARGET_DIR> --table <TABLE> --check-column <COLUMN> --incremental lastmodified --last-value 1981-01-01

    注意:

    1. 增量类型是最后修改的
    2. 作业创建成功,表名与job_1中使用的
    3. 不同
    4. 作业执行仅在第一时间成功
    5. 可以在HDFS中看到导入的行以便首次执行
    6. 后续作业执行失败,并显示以下错误:

      ERROR security.UserGroupInformation: PriviledgedActionException as:<MY_UNIX_USER>(auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory <TARGET_DIR_AS_SPECIFIED_IN_job_2> already exists
      ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory <TARGET_DIR_AS_SPECIFIED_IN_job_2> already exists
          at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
          at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872)
          at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:396)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
          at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
          at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
          at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
          at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:141)
          at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:202)
          at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:465)
          at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:108)
          at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:403)
          at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
          at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228)
          at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
          at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
          at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
          at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
          at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
          at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
          at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
      

1 个答案:

答案 0 :(得分:0)

如果你想一次又一次地执行job_2,那么你需要使用--incremental lastmodified --append

sqoop --create job_2 -- import --connect <CONNECT_STRING> --username <UNAME> 
--password <PASSWORD> --table <TABLE> --incremental lastmodified --append 
--check-column<COLUMN> --last-value "2017-11-05 02:43:43" --target-dir 
<TARGET_DIR> -m 1