我是Hadoop的新手。我试图执行以下查询,但是进展不顺利。
sqoop import --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" -- username retail_dba --password cloudera --query "SELECT order_items.order_item_product_id, orders.order_status FROM orders INNER JOIN order_items ON orders.order_id = order_items.order_item_order_id WHERE \$CONDITIONS" --target-dir /user/cloudera/order_join1 --split-by order_id --num-mappers 4
当我尝试在mysql和sqoop eval中执行上述查询时,它运行良好但是在导入参数中尝试获取错误时如下:
[cloudera@quickstart ~]$ sqoop import --connect
"jdbc:mysql://quickstart.cloudera:3306/retail_db" --username retail_dba
--password cloudera --query "SELECT order_items.order_item_product_id,
orders.order_status FROM orders INNER JOIN order_items ON orders.order_id =
order_items.order_item_order_id WHERE \$CONDITIONS" --target-dir
/user/cloudera/order_join1 --split-by order_id --num-mappers 4
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/01/15 14:34:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
17/01/15 14:34:59 WARN tool.BaseSqoopTool: Setting your password on the
command-line is insecure. Consider using -P instead.
17/01/15 14:35:00 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/01/15 14:35:00 INFO tool.CodeGenTool: Beginning code generation
17/01/15 14:35:02 INFO manager.SqlManager: Executing SQL statement: SELECT
order_items.order_item_product_id, orders.order_status FROM orders INNER
JOIN order_items ON orders.order_id = order_items.order_item_order_id WHERE (1 = 0)
17/01/15 14:35:02 INFO manager.SqlManager: Executing SQL statement: SELECT
order_items.order_item_product_id, orders.order_status FROM orders INNER
JOIN order_items ON orders.order_id = order_items.order_item_order_id WHERE (1 = 0)
17/01/15 14:35:03 INFO manager.SqlManager: Executing SQL statement: SELECT
order_items.order_item_product_id, orders.order_status FROM orders INNER
JOIN order_items ON orders.order_id = order_items.order_item_order_id WHERE (1 = 0)
17/01/15 14:35:03 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-
cloudera/compile/f6cf89b54d33e5676419b1646a648100/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/01/15 14:35:10 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-
cloudera/compile/f6cf89b54d33e5676419b1646a648100/QueryResult.jar
17/01/15 14:35:10 INFO mapreduce.ImportJobBase: Beginning query import.
17/01/15 14:35:12 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/01/15 14:35:15 INFO Configuration.deprecation: mapred.map.tasks is
deprecated. Instead, use mapreduce.job.maps
17/01/15 14:35:16 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/10.0.2.15:8032
17/01/15 14:35:20 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:862)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeInternal(DFSOutputStream.java:830)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:826)
17/01/15 14:35:23 INFO db.DBInputFormat: Using read commited transaction isolation
17/01/15 14:35:23 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(order_id), MAX(order_id) FROM (SELECT
order_items.order_item_product_id, orders.order_status FROM orders INNER
JOIN order_items ON orders.order_id = order_items.order_item_order_id WHERE (1 = 1) ) AS t1
17/01/15 14:35:24 INFO mapreduce.JobSubmitter: Cleaning up the staging area
/user/cloudera/.staging/job_1484512313628_0005
17/01/15 14:35:24 WARN security.UserGroupInformation:
PriviledgedActionException as:cloudera (auth:SIMPLE) cause:java.io.IOException:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 'order_id' in 'field list'
17/01/15 14:35:24 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 'order_id' in 'field list'
at
org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:207)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:305)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:203)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:176)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:273)
at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:748)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:509)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 'order_id' in 'field list'
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
at com.mysql.jdbc.Util.getInstance(Util.java:360)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:978)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2435)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2582)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2526)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1446)
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:178)
... 22 more
任何人都可以帮助我吗?在命令错误的地方以及为什么我们必须在查询中使用WHERE \$CONDITIONS
?
答案 0 :(得分:0)
您可能需要在选择列表中包含“order_id”列。
sqoop import --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" --username retail_dba --password cloudera
--query "SELECT orders.order_id, order_items.order_item_product_id, orders.order_status FROM orders INNER JOIN order_items ON orders.order_id = order_items.order_item_order_id WHERE \$CONDITIONS" --target-dir /user/cloudera/order_join1 --split-by order_id --num-mappers 4