我在cloudera中有一个数据库。使用其中两个表,我试图找到account
表中只记录了1个设备的accountdevice
个记录。为此,我生成了以下查询:
[training@localhost ~]$ sqoop import -P \
> --connect jdbc:mysql://localhost/loudacre \
> --username training \
> --target-dir /ZXS107020/loudacre/pset1 \
> --split-by accounts.acct_num \
> --query 'SELECT first_name, last_name, acct_num, city, state FROM accounts JOIN accountdevice ON (accounts.acct_num = accountdevice.account_id) WHERE $CONDITIONS AND count(accountdevice.account_id) = 1'
但是,这不起作用并产生以下信息:
18/02/02 07:13:16 ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: Invalid use of group function
java.sql.SQLException: Invalid use of group function
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:996)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3887)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3823)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2435)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2582)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2530)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1907)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2030)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:753)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:762)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForQuery(SqlManager.java:234)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:304)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
18/02/02 07:13:16 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1651)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
我正在使用的表格结构如下:
mysql> describe accountdevice;
+-------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| account_id | int(11) | NO | MUL | NULL | |
| device_id | int(11) | NO | MUL | NULL | |
| activation_date | datetime | NO | | NULL | |
| account_device_id | varchar(255) | NO | | NULL | |
+-------------------+--------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> describe accounts;
+----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| acct_num | int(11) | NO | PRI | NULL | |
| acct_create_dt | datetime | NO | | NULL | |
| acct_close_dt | datetime | YES | | NULL | |
| first_name | varchar(255) | NO | | NULL | |
| last_name | varchar(255) | NO | | NULL | |
| address | varchar(255) | NO | | NULL | |
| city | varchar(255) | NO | | NULL | |
| state | varchar(255) | NO | | NULL | |
| zipcode | varchar(255) | NO | | NULL | |
| phone_number | varchar(255) | NO | | NULL | |
| created | datetime | NO | | NULL | |
| modified | datetime | NO | | NULL | |
+----------------+--------------+------+-----+---------+-------+
我尝试运行的查询是:为只有一个设备注册的客户端选择帐户信息。
我做错了什么?我尝试过使用'WHERE $CONDITIONS AND'
以及使用"WHERE \$CONDITIONS"
答案 0 :(得分:1)
我建议先在MySQL中运行查询,这样可以验证查询是否正常工作。我认为查询有问题。
您必须考虑的另一个问题是sqoop import --query
选项用于处理单个语句,并且存在有关使用复杂查询的警告。
来自sqoop Doc:
在当前版本的Sqoop中使用自由格式查询的功能 仅限于没有模棱两可的预测的简单查询 并且WHERE子句中没有OR条件。使用复杂的查询等 作为具有子查询或联接的查询导致模糊不清 预测会导致意想不到的结果。
建议1:在mysql中运行查询并将数据推送到mysql中的新表,并从新表中导入带有sqoop的记录。
建议2:使用要运行的复杂查询在mysql中创建存储过程,并通过sqoop import中的--query
选项调用它。如下所示:
-- creating the stored procedure in my sql
mysql> CREATE PROCEDURE simpleprocforimport (OUT param1 INT)
-> BEGIN
-> SELECT first_name, last_name, acct_num, city, state FROM accounts JOIN accountdevice ON (accounts.acct_num = accountdevice.account_id) AND count(accountdevice.account_id) = 1;
-> END//
#From the sqoop import just call the procedure as below
sqoop import -P \
--connect jdbc:mysql://localhost/loudacre \
--username training \
--target-dir /ZXS107020/loudacre/pset1 \
--split-by accounts.acct_num \
--query "CALL simpleprocforimport (@a);"
我没有在mysql环境中测试,但如果你遇到任何问题,请告诉我。