Apache Spark是否有一种方法可以自动格式化/自动将数据类型转换为已知的原始格式?

时间:2016-08-30 20:30:40

标签: apache-spark

在spark 1.4.1中,我在执行join java.lang.ClassCastException时遇到错误:java.lang.Long无法强制转换为org.apache.spark.sql.types.UTF8String。问题是有没有办法将数据类型自动转换为熟知原语来回?

java.lang.ClassCastException:java.lang.Long无法强制转换为org.apache.spark.sql.types.UTF8String     在org.apache.spark.sql.catalyst.expressions.Cast $$ anonfun $ castToDouble $ 1 $$ anonfun $ apply $ 48.apply(Cast.scala:354)     在org.apache.spark.sql.catalyst.expressions.Cast.org $ apache $ spark $ sql $ catalyst $ expressions $ Cast $$ buildCast(Cast.scala:111)     在org.apache.spark.sql.catalyst.expressions.Cast $$ anonfun $ castToDouble $ 1.apply(Cast.scala:354)     在org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:436)     在org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68)     在org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52)     在org.apache.spark.sql.execution.joins.HashJoin $$ anon $ 1.fetchNext(HashJoin.scala:89)     在org.apache.spark.sql.execution.joins.HashJoin $$ anon $ 1.hasNext(HashJoin.scala:66)     在scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:327)     在scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:327)     在scala.collection.Iterator $$ anon $ 10.hasNext(Iterator.scala:308)     在scala.collection.Iterator $ class.foreach(Iterator.scala:727)     在scala.collection.AbstractIterator.foreach(Iterator.scala:1157)     在scala.collection.generic.Growable $ class。$ plus $ plus $ eq(Growable.scala:48)     在scala.collection.mutable.ArrayBuffer。$ plus $ plus $ eq(ArrayBuffer.scala:103)     在scala.collection.mutable.ArrayBuffer。$ plus $ plus $ eq(ArrayBuffer.scala:47)     在scala.collection.TraversableOnce $ class.to(TraversableOnce.scala:273)     在scala.collection.AbstractIterator.to(Iterator.scala:1157)     在scala.collection.TraversableOnce $ class.toBuffer(TraversableOnce.scala:265)     在scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)     在scala.collection.TraversableOnce $ class.toArray(TraversableOnce.scala:252)     在scala.collection.AbstractIterator.toArray(Iterator.scala:1157)     在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ 3.apply(SparkPlan.scala:143)     在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ 3.apply(SparkPlan.scala:143)     在org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:1767)     在org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:1767)     在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)     在org.apache.spark.scheduler.Task.run(Task.scala:70)     在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:213)     在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)     在java.lang.Thread.run(Thread.java:745) 2016-08-30 16:28:01.172 [WARN](task-result-getter-3)TaskSetManager:71 - 阶段27.0中丢失的任务0.0(TID 428,localhost):java.lang.ClassCastException:java.lang.Long无法强制转换为org.apache.spark.sql.types.UTF8String     在org.apache.spark.sql.catalyst.expressions.Cast $$ anonfun $ castToDouble $ 1 $$ anonfun $ apply $ 48.apply(Cast.scala:354)     在org.apache.spark.sql.catalyst.expressions.Cast.org $ apache $ spark $ sql $ catalyst $ expressions $ Cast $$ buildCast(Cast.scala:111)     在org.apache.spark.sql.catalyst.expressions.Cast $$ anonfun $ castToDouble $ 1.apply(Cast.scala:354)     在org.apache.spark.sql.catalyst.expressions.Cast.eval(Cast.scala:436)     在org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68)     在org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52)     在org.apache.spark.sql.execution.joins.HashJoin $$ anon $ 1.fetchNext(HashJoin.scala:89)     在org.apache.spark.sql.execution.joins.HashJoin $$ anon $ 1.hasNext(HashJoin.scala:66)     在scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:327)     在scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:327)     在scala.collection.Iterator $$ anon $ 10.hasNext(Iterator.scala:308)     在scala.collection.Iterator $ class.foreach(Iterator.scala:727)     在scala.collection.AbstractIterator.foreach(Iterator.scala:1157)     在scala.collection.generic.Growable $ class。$ plus $ plus $ eq(Growable.scala:48)     在scala.collection.mutable.ArrayBuffer。$ plus $ plus $ eq(ArrayBuffer.scala:103)     在scala.collection.mutable.ArrayBuffer。$ plus $ plus $ eq(ArrayBuffer.scala:47)     在scala.collection.TraversableOnce $ class.to(TraversableOnce.scala:273)     在scala.collection.AbstractIterator.to(Iterator.scala:1157)     在scala.collection.TraversableOnce $ class.toBuffer(TraversableOnce.scala:265)     在scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)     在scala.collection.TraversableOnce $ class.toArray(TraversableOnce.scala:252)     在scala.collection.AbstractIterator.toArray(Iterator.scala:1157)     在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ 3.apply(SparkPlan.scala:143)     在org.apache.spark.sql.execution.SparkPlan $$ anonfun $ 3.apply(SparkPlan.scala:143)     在org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:1767)     在org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:1767)     在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)     在org.apache.spark.scheduler.Task.run(Task.scala:70)     在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:213)     在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)     在java.lang.Thread.run(Thread.java:745)

1 个答案:

答案 0 :(得分:1)

问题很明显:“java.lang.ClassCastException:java.lang.Long无法强制转换为org.apache.spark.sql.types.UTF8String”您无法在SQL语句中将长字段与String字段进行比较

您需要实现UDF函数来比较它们。

sqlContext.udf().register("string2Long",new UDF1<String, Long>() {
             public Long call(String str) throws Exception {
                return Long.valueOf(str).longValue();
             }
        }, DataTypes.LongType);

然后在sql语句中使用函数 string2Long(字符串字段)