我是Spark的新手,尝试运行我的第一个Spark SQL代码时卡住了。我运行了一个简单的程序,使用Eclipse IDE中的spark SQL加载json文件。这是我的代码:
SparkSession spSession = SparkSession.builder().appName("First SQL code").master("local[2]").config("spark.sql.warehouse.dir", tempDir).getOrCreate();
spSession.read().csv("data/customerData.json");
Dataset<Row> empDf = spSession.read().json("data/customerData.json");
empDf.show();
empDf.printSchema();
代码抛出以下错误:
线程“ main”中的异常java.lang.NoClassDefFoundError:scala / collection / GenTraversableOnce $ class
这是我的pom依赖项的样子:
<dependencies>
<dependency>
<!-- Apache Spark main library -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.6</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.5.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
答案 0 :(得分:0)
pom中包含的spark库的版本似乎存在一些问题。我在2.3版本中将其变灰了,并且同样有效。以下是我使用的代码和pom
SparkSession spSession = SparkSession.builder()
.appName("First SQL code")
.master("local[*]")
.getOrCreate();
Dataset<String> empDf = spSession.read()
.format("json")
.textFile("C:\\Users\\Kiran\\workspace\\JsonSpark\\input\\");
empDf.show();
empDf.printSchema();
依赖性
<dependency>
<!-- Apache Spark main library -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.0</version>
</dependency>