Cloudera Sqoop导入SQL查询麻烦“where”子句

时间:2018-02-02 15:28:26

标签: mysql cloudera sqoop

我在cloudera中有一个数据库。使用其中两个表,我试图找到account表中只记录了1个设备的accountdevice个记录。为此,我生成了以下查询:

[training@localhost ~]$ sqoop import -P \
> --connect jdbc:mysql://localhost/loudacre \
> --username training \
> --target-dir /ZXS107020/loudacre/pset1 \
> --split-by accounts.acct_num \
> --query 'SELECT first_name, last_name, acct_num, city, state FROM accounts JOIN accountdevice ON (accounts.acct_num = accountdevice.account_id) WHERE $CONDITIONS AND count(accountdevice.account_id) = 1'

但是,这不起作用并产生以下信息:

18/02/02 07:13:16 ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: Invalid use of group function
java.sql.SQLException: Invalid use of group function
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:996)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2435)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2582)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2530)
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1907)
    at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2030)
    at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:753)
    at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:762)
    at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
    at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
    at org.apache.sqoop.manager.SqlManager.getColumnTypesForQuery(SqlManager.java:234)
    at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:304)
    at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
    at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
    at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
18/02/02 07:13:16 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
    at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651)
    at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

我正在使用的表格结构如下:

mysql> describe accountdevice;
+-------------------+--------------+------+-----+---------+----------------+
| Field             | Type         | Null | Key | Default | Extra          |
+-------------------+--------------+------+-----+---------+----------------+
| id                | int(11)      | NO   | PRI | NULL    | auto_increment |
| account_id        | int(11)      | NO   | MUL | NULL    |                |
| device_id         | int(11)      | NO   | MUL | NULL    |                |
| activation_date   | datetime     | NO   |     | NULL    |                |
| account_device_id | varchar(255) | NO   |     | NULL    |                |
+-------------------+--------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)

mysql> describe accounts;
+----------------+--------------+------+-----+---------+-------+
| Field          | Type         | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| acct_num       | int(11)      | NO   | PRI | NULL    |       |
| acct_create_dt | datetime     | NO   |     | NULL    |       |
| acct_close_dt  | datetime     | YES  |     | NULL    |       |
| first_name     | varchar(255) | NO   |     | NULL    |       |
| last_name      | varchar(255) | NO   |     | NULL    |       |
| address        | varchar(255) | NO   |     | NULL    |       |
| city           | varchar(255) | NO   |     | NULL    |       |
| state          | varchar(255) | NO   |     | NULL    |       |
| zipcode        | varchar(255) | NO   |     | NULL    |       |
| phone_number   | varchar(255) | NO   |     | NULL    |       |
| created        | datetime     | NO   |     | NULL    |       |
| modified       | datetime     | NO   |     | NULL    |       |
+----------------+--------------+------+-----+---------+-------+

我尝试运行的查询是:为只有一个设备注册的客户端选择帐户信息。

我做错了什么?我尝试过使用'WHERE $CONDITIONS AND'以及使用"WHERE \$CONDITIONS"

1 个答案:

答案 0 :(得分:1)

我建议先在MySQL中运行查询,这样可以验证查询是否正常工作。我认为查询有问题。

您必须考虑的另一个问题是sqoop import --query选项用于处理单个语句,并且存在有关使用复杂查询的警告。

来自sqoop Doc:

  

在当前版本的Sqoop中使用自由格式查询的功能   仅限于没有模棱两可的预测的简单查询   并且WHERE子句中没有OR条件。使用复杂的查询等   作为具有子查询或联接的查询导致模糊不清   预测会导致意想不到的结果。

建议1:在mysql中运行查询并将数据推送到mysql中的新表,并从新表中导入带有sqoop的记录。

建议2:使用要运行的复杂查询在mysql中创建存储过程,并通过sqoop import中的--query选项调用它。如下所示:

-- creating the stored procedure in my sql
mysql> CREATE PROCEDURE simpleprocforimport (OUT param1 INT)
    -> BEGIN
    ->   SELECT first_name, last_name, acct_num, city, state FROM accounts JOIN accountdevice ON (accounts.acct_num = accountdevice.account_id) AND count(accountdevice.account_id) = 1;
    -> END//

#From the sqoop import just call the procedure as below
sqoop import -P \
--connect jdbc:mysql://localhost/loudacre \
--username training \
--target-dir /ZXS107020/loudacre/pset1 \
--split-by accounts.acct_num \
--query "CALL simpleprocforimport (@a);"

我没有在mysql环境中测试,但如果你遇到任何问题,请告诉我。