Apache Sqoop Where子句在使用SQOOP导入时不起作用

时间:2016-10-23 20:13:20

标签: hadoop sqoop

有谁能告诉我这个命令的输出是什么: 这里departments表有默认的6行(从dept_id 2到7)然后我将2条新记录添加到Mysql db' retail _db.departments'表(department_id 8& 9)。我想要做的是通过使用-where参数&选择新添加的记录。将(-append)附加到部门的现有HDFS目录。 因此,当我运行以下命令时,它创建了一个新的part-m-000006文件(之前默认的6条记录被分成part-m-00000到part-m-00005文件)和所有记录从department_id 2到9(其中包括2个新添加的recs),你可以看到下面的输出重复记录。

不明白为什么它不尊重where子句:

sqoop import \
–connect “jdbc:mysql://quickstart.cloudera:3306/retail_db” \
–username retail_dba \
–password cloudera \
–query “Select * from departments where \$CONDITIONS” \
–where “department_id > 7” \
–append \
-m 1 \
–target-dir /user/cloudera/sqoop_import/departments

Output :
—————————————————————————————————————————–
[cloudera@quickstart ~]$ hdfs dfs -cat /user/cloudera/sqoop_import/departments/part*
2,Fitness
3,Footwear
4,Apparel
5,Golf
6,Outdoors
7,Fan Shop
2,Fitness
3,Footwear
4,Apparel
5,Golf
6,Outdoors
7,Fan Shop
8,Sports
9,Jewellery

------------------------------------------

LOGS GENERATED :
—————————————————————————————————————————–
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/10/23 12:23:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.0
16/10/23 12:23:30 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/10/23 12:23:31 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/10/23 12:23:31 INFO tool.CodeGenTool: Beginning code generation
16/10/23 12:23:31 INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)
16/10/23 12:23:31 INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)
16/10/23 12:23:31 INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)
16/10/23 12:23:31 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/b704a6e6d921fb544ba25c6343b18a36/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/10/23 12:23:33 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/b704a6e6d921fb544ba25c6343b18a36/QueryResult.jar
16/10/23 12:23:33 INFO mapreduce.ImportJobBase: Beginning query import.
16/10/23 12:23:34 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/10/23 12:23:35 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/10/23 12:23:36 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
16/10/23 12:23:38 INFO db.DBInputFormat: Using read commited transaction isolation
16/10/23 12:23:38 INFO mapreduce.JobSubmitter: number of splits:1
16/10/23 12:23:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1477192024680_0012
16/10/23 12:23:40 INFO impl.YarnClientImpl: Submitted application application_1477192024680_0012
16/10/23 12:23:40 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1477192024680_0012/
16/10/23 12:23:40 INFO mapreduce.Job: Running job: job_1477192024680_0012
16/10/23 12:23:56 INFO mapreduce.Job: Job job_1477192024680_0012 running in uber mode : false
16/10/23 12:23:56 INFO mapreduce.Job: map 0% reduce 0%
16/10/23 12:24:25 INFO mapreduce.Job: map 100% reduce 0%
16/10/23 12:24:26 INFO mapreduce.Job: Job job_1477192024680_0012 completed successfully
16/10/23 12:24:27 INFO mapreduce.Job: Counters: 30

1 个答案:

答案 0 :(得分:2)

您正在使用--query--where。这就是为什么sqoop不尊重 --where标签。

--query--where的超集。它涵盖了WHERE条件。

这就是你在日志中看到的原因:

INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)

使用任何一个

  • --query "select * from departments where department_id > 7 AND \$CONDITIONS"

  • --where "department_id > 7"