我使用scala 2.11而不是2.12(使用spark 2.2.1)有一个奇怪的编译错误
这是我的 scala 代码
val spark = SparkSession.builder.
master("local")
.appName("spark rmd connect import")
.enableHiveSupport()
.getOrCreate()
//LOAD
var time = System.currentTimeMillis()
val r_log_o = spark.read.format("orc").load("log.orc")
val r_log = r_log_o.drop(r_log_o.col("id"))
System.currentTimeMillis() - time
time = System.currentTimeMillis()
r_log_o.toJavaRDD.cache().map((x:Row) => {x(4).asInstanceOf[Timestamp]}).reduce(minTs(_, _))
System.currentTimeMillis() - time
,其中
def minTs(x: Timestamp, y: Timestamp): Timestamp = {
if (x.compareTo(y) < 0) return x;
else return y;
}
我的 pom.xml 配置如下
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.3.1</version>
<configuration>
<scalaVersion>2.11</scalaVersion>
</configuration>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.12</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.2.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.2.1</version>
</dependency>
</dependencies>
如果我编译<scalaVersion>2.12</scalaVersion>
它编译,使用scala 2.11我得到以下错误
[INFO] / root / project / src / main / java:-1:info:compiling [INFO] / root / project / src / main / scala:-1:info:compiling [INFO]编译2个源文件 / root / rmd-connect-spark / target / classes at 1515426201592 [ERROR] /root/rmd-connect-spark/src/main/scala/SparkConnectTest.scala:40: 错误:类型不匹配; [错误]发现:org.apache.spark.sql.Row =&gt; 需要java.sql.Timestamp [ERROR]: org.apache.spark.api.java.function.Function [org.apache.spark.sql.Row ,?] [错误] .map((x:行)=&gt; {x(4).asInstanceOf [Timestamp]})
[ERROR] ^ [ERROR]找到一个错误[INFO]
[INFO] BUILD FAILURE [INFO]
注意:这不是火花运行时的问题是使用scala 2.11和spark api的问题
答案 0 :(得分:4)
您有一个javaRDD,因此您需要使用Java api和org.apache.spark.api.java.function.Function
而不是Scala函数。在Scala 2.12中添加了支持以自动将Scala函数转换为Java SAM接口,这就是此代码在Scala 2.12中有效的原因。
如果要在Scala中进行编码,请使用Scala API而不是Java。