我有一个带火花的scala maven应用程序。我使用Intellij Idea。我从它做了一个可执行的jar但当我尝试通过Windows控制台启动它时,有一个关于缺少某些类的错误。我无法弄清楚是否存在问题,因为我已将其添加到我的.pom文件中。当我查看.jar时,我看到那个类的库:
需要.jar中的库:
我尝试使用两个插件:maven-shade-plugin和maven-assembly-plugin,结果是一样的。我尝试通过Intellij中的项目结构 - >库显式在classpath中设置这个库:
IDEA中的类路径:
任何帮助将不胜感激! 这是我的代码:
import org.apache.spark.broadcast.Broadcast
import org.apache.spark.ml.recommendation.{ALS, ALSModel}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
import scala.collection.{Map, Set}
import scala.collection.mutable.ArrayBuffer
import scala.util.Random
object RunRecommender {
def main(args: Array[String]): Unit = {
val spark: SparkSession =SparkSession.builder()
.master("local")
.appName("Recommender Engines with Audioscrobbler data")
.config("spark.sql.warehouse.dir", "spark-warehouse")
.getOrCreate()
val rawUserArtistData: Dataset[String] = spark.read.textFile("user_artist_data.txt")
val rawArtistData: Dataset[String] = spark.read.textFile("artist_data.txt")
val rawArtistAlias: Dataset[String] = spark.read.textFile("artist_alias.txt")
val runRecommender: RunRecommender = new RunRecommender(spark)
runRecommender.preparation(rawUserArtistData, rawArtistData, rawArtistAlias)
runRecommender.model(rawUserArtistData, rawArtistData, rawArtistAlias)
runRecommender.evaluate(rawUserArtistData, rawArtistAlias)
runRecommender.recommend(rawUserArtistData, rawArtistData, rawArtistAlias)
}
}
这是我的.pom文件:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>recommends</groupId>
<artifactId>recommends</artifactId>
<packaging>jar</packaging>
<name>Recommender Engine with Audioscrobbler data</name>
<version>1.0-SNAPSHOT</version>
<repositories>
<repository>
<id>mavencentral</id>
<name>Maven Central</name>
<url>https://repo1.maven.org/maven2/</url>
<layout>default</layout>
</repository>
</repositories>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.1</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.0.2</version>
</plugin>
</plugins>
</pluginManagement>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.1</version>
<configuration>
<archive>
<manifest>
<mainClass>RunRecommender</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>scala-compile-first</id>
<phase>process-resources</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
</goals>
</execution>
<execution>
<id>scala-test-compile</id>
<phase>process-test-resources</phase>
<goals>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-shade-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<createDependencyReducedPom>false</createDependencyReducedPom>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>RunRecommender</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>2.0.0</version>
</dependency>
</dependencies>
</project>
当我尝试运行jar时,这是堆栈跟踪:
16/11/06 11:56:40 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
16/11/06 11:56:40 INFO SharedState: Warehouse path is 'spark-warehouse'.
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: text. Please find packages at http://spark-packages.org
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:145)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:78)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:78)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:310)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:492)
at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:528)
at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:501)
at RunRecommender$.main(RunRecommender.scala:20)
at RunRecommender.main(RunRecommender.scala)
Caused by: java.lang.ClassNotFoundException: text.DefaultSource
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:130)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:130)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:130)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:130)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:130)
... 9 more
16/11/06 11:56:40 INFO SparkContext: Invoking stop() from shutdown hook
1
答案 0 :(得分:0)
我有一个相似的问题,对我来说,maven中的这个元素丢失了:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.2</version>
</dependency>
答案 1 :(得分:0)
仅当您尝试使用spark-submit运行相同的jar来运行jar而不是在终端中运行java -jar时,才能轻松解决此问题。