使用sqoop生成的代码加载序列数据文件时出错

时间:2019-05-14 18:17:00

标签: apache-spark hadoop sqoop

我使用sqoop在序列文件中导入了数据,然后使用spark-shell加载了这些数据。从spark生成的代码引用了com.cloudera.sqoop.lib包中的类。在spark-shell中运行命令会产生以下错误:

  val ordersRDD = sc.sequenceFile("/user/pawinder/problem1-seq/orders",classOf[org.apache.hadoop.io.IntWritable],classOf[com.problem1.retaildb.orders])
    warning: Class com.cloudera.sqoop.lib.SqoopRecord not found - continuing with a stub.
    warning: Class com.cloudera.sqoop.lib.LargeObjectLoader not found - continuing with a stub.
    warning: Class com.cloudera.sqoop.lib.LargeObjectLoader not found - continuing with a stub.
    warning: Class com.cloudera.sqoop.lib.DelimiterSet not found - continuing with a stub.
    warning: Class com.cloudera.sqoop.lib.DelimiterSet not found - continuing with a stub.
    warning: Class com.cloudera.sqoop.lib.DelimiterSet not found - continuing with a stub.
    warning: Class com.cloudera.sqoop.lib.RecordParser not found - continuing with a stub.
    error: Class com.cloudera.sqoop.lib.SqoopRecord not found - continuing with a stub.

我可以指示sqoop生成代码而不依赖cloudera软件包吗? 启动spark-shell时是否需要添加具有com.cloudera.sqoop.lib软件包的jar文件? 在哪里可以找到jar文件? 我应该为值类编写代码,以使其不依赖于com.cloudera.sqoop.lib包吗?

我正在使用cloudera quickstart vm。非常感谢您的帮助。

  

编辑:通过将sqoop-1.4.6.2.6.5.0-292.jar添加到   spark2-shell

 spark-shell --jars problem1/bin/orders.jar,/usr/hdp/2.6.5.0-292/sqoop/sqoop-1.4.6.2.6.5.0-292.jar

我试图通过为Orders定义一个案例类来解决此问题,但这没有用。 MapReduce作业仍然引用了com.cloudera.sqoop包类

scala> case class Orders(order_id:Int,order_date:java.sql.Timestamp,customer_id:Int,status:String)
defined class Orders
scala> val ordersRDD = sc.sequenceFile("/user/pawinder/problem1-seq/orders",classOf[org.apache.hadoop.io.IntWritable],classOf[Orders])
 ordersRDD: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.IntWritable, Orders)] = /user/pawinder/problem1-seq/orders HadoopRDD[0] at sequenceFile at <console>:26

scala> ordersRDD.count
    19/05/14 14:29:21 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
    java.lang.NoClassDefFoundError: com/cloudera/sqoop/lib/SqoopRecord

0 个答案:

没有答案