使用IntelliJIdea和Maven,我正在尝试接受一个csv表并将其转换为Hive Table(或者现在的镶木地板也没问题)。这是我目前的代码:
import org.apache.spark.sql.SparkSession
import scala.io.Source
import org.apache.spark.sql.types._
object main extends App{
val spark = SparkSession.builder.master("local").appName("my-spark-app").enableHiveSupport().getOrCreate()
val lines = Source.fromFile("C://share_VB/file_name.csv").getLines.toArray
//val myDF = spark.read.csv("C://share_VB/file_name.csv")
//myDF.write.save("C://Users/my_name/ParquetFiles")
for (line <- lines){
if (!line.isEmpty){
val testcase = line.split(",").toBuffer
println(testcase.head)
println(testcase(1))
testcase.remove(0, 2)
while (testcase.nonEmpty){
println(testcase.head)
println(testcase(1))
testcase.remove(0, 2)
}
}
}
}
pom.xml文件:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>seeifthisworks</groupId>
<artifactId>seeifthisworks</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<scala.version>2.11.8</scala.version>
<scala.compat.version>2.11</scala.compat.version>
<spark.version>2.2.0.cloudera1</spark.version>
<config.version>1.3.2</config.version>
<scalatest.version>3.0.1</scalatest.version>
<spark-testing-base.version>2.2.0_0.8.0</spark-testing-base.version>
</properties>
<!-- set repositories first !!, so that dependencies use the URL for the repos -->
<repositories>
<repository>
<id>Maven</id>
<url>http://repo1.maven.org/maven2</url>
</repository>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.compat.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.compat.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project>
如果我注释掉val spark = SparkSession,它会完美运行.... 但是,如果我把它留在那里并尝试运行任何东西,我会遇到错误:
Error: Unable to initialize main class main
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession
但是很明显我已经导入了SparkSession和Maven:org.apache.spark:spark-core_2.11:2.2.0.cloudera1
在我的库中,所以从理论上讲,我认为它应该可行。
有人可以帮助我查明问题并解释如何解决这个问题吗?
编辑:删除<scope>provided</scope>
后,我现在遇到了一个不同的错误:
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
at org.apache.spark.util.Utils$.getCallSite(Utils.scala:1440)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:76)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
at main$.delayedEndpoint$main$1(main.scala:7)
at main$delayedInit$body.apply(main.scala:6)
at scala.Function0.apply$mcV$sp(Function0.scala:34)
at scala.Function0.apply$mcV$sp$(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App.$anonfun$main$1$adapted(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:389)
at scala.App.main(App.scala:76)
at scala.App.main$(App.scala:74)
at main$.main(main.scala:6)
at main.main(main.scala)