我正在尝试在HDP 2.3.2沙箱中导入Hive中的70 + GB表,我在SQL Server和沙箱之间建立了连接,但是,在尝试使用以下命令导入表时: / p>
sudo -u hdfs sqoop import --connect "jdbc:sqlserver://XX.XX.XX.XX;database=XX;username=XX;password=XX" --table XX --split-by ID --target-dir "/user/hdfs/Kunal/2" --hive-import -- --schema dbo
但是它给了我以下错误
Error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.UnsupportedOperationException: Java Runtime Environment (JRE) version 1.7 is not supported by this driver. Use the sqljdbc4.jar class library, which provides support for JDBC 4.0.
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: java.lang.UnsupportedOperationException: Java Runtime Environment (JRE) version 1.7 is not supported by this driver. Use the sqljdbc4.jar class library, which provides support for JDBC 4.0.
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:220)
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:165)
... 9 more
答案 0 :(得分:0)
选项1:要使用单个映射器(-m 1),请注意单个符号。但整个70GB将在sigle线程中读取,你可能会在完成时遇到dealy,也可能写入单个hdfs文件。
选项2:使用--split-by和稀疏分布式--split-by用于拆分工作单元的表的列。防爆。 emp表中的Employee_id将是唯一且稀疏分布的。
参考:http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html最新的sqoop用户指南。