我一直在努力解决这个问题。我一直试图运行一个简单的推特情绪分析代码,它似乎早些时候工作正常,但不再有效。我使用spark 1.3.1与scala 2.10.4。我在某处读到TwitterUtils不能与spark 1.0+一起使用,所以我尝试了一种解决方法。根据书籍,一切似乎都已到位...... scala的正确目录结构,使用sbt程序集的胖jar,正确的路径但不知何故火花无法拾取jar文件,我也得到了一个ClassNotFoundException。
可能出现什么问题,我该如何解决?
编辑:
命令行
../ bin / spark-submit --class Sentimenter --master local [4] /home/ubuntu/spark/spark_examples/target/scala-2.10/twitter-sentiment-assembly-1.0.jar
错误:
Warning: Local jar /home/ubuntu/spark/spark_examples/target/scala-2.10/twitter-sentiment-assembly-1.0.jar does not exist, skipping.
java.lang.ClassNotFoundException: Sentimenter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
build.sbt文件:
lazy val root = (project in file(".")).
settings(
name := "twitter-sentiment",
version := "1.0",
scalaVersion := "2.10.4",
mainClass in Compile := Some("Sentimenter")
)
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.3.1" % "provided",
"org.apache.spark" %% "spark-streaming" % "1.3.1" % "provided",
"org.apache.spark" % "spark-streaming-twitter_2.11" % "1.3.1"
)
// META-INF discarding
val meta = """META.INF(.)*""".r
assemblyMergeStrategy in assembly := {
case PathList("javax", "servlet", xs @ _*) => MergeStrategy.first
case PathList(ps @ _*) if ps.last endsWith ".html" => MergeStrategy.first
case n if n.startsWith("reference.conf") => MergeStrategy.concat
case n if n.endsWith(".conf") => MergeStrategy.concat
case meta(_) => MergeStrategy.discard
case x => MergeStrategy.first`
这是我从另一个关于Twitter情绪的论坛获得的代码
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.SparkContext
import org.apache.spark.streaming.twitter._
import org.apache.spark.SparkConf
object Sentimenter {
def main(args: Array[String]) {
System.setProperty("twitter4j.oauth.consumerKey","xxxxxxxxxxxxx");
System.setProperty("twitter4j.oauth.consumerSecret","xxxxxxxxxxxx");
System.setProperty("twitter4j.oauth.accessToken","xxxxxxxxxxxx");
System.setProperty("twitter4j.oauth.accessTokenSecret","xxxxxxxxxx");
val filters = new Array[String](2)
filters(0) = "Big Data"
filters(1) = "geofencing"
val sparkConf = new SparkConf().setAppName("TweetSentiment").setMaster("local[4]").set("spark.driver.allowMultipleContexts", "true")
val sc = new SparkContext(sparkConf)
// get the list of positive words
val pos_list = sc.textFile("/home/ubuntu/spark/src/main/scala/Positive_Words.txt") //Random
.filter(line => !line.isEmpty())
.collect()
.toSet
// get the list of negative words
val neg_list = sc.textFile("/home/ubuntu/spark/src/main/scala/Negative_Words.txt") //Random
.filter(line => !line.isEmpty())
.collect()
.toSet
// create twitter stream
val ssc = new StreamingContext(sparkConf, Seconds(5))
val stream = TwitterUtils.createStream(ssc, None, filters)
val tweets = stream.map(r => r.getText)
tweets.print() // print tweet text
ssc.start()
ssc.awaitTermination()
}
}
答案 0 :(得分:2)
我认为如果你写
../bin/spark-submit --class Sentimenter --master local[4] --jars /home/ubuntu/spark/spark_examples/target/scala-2.10/twitter-sentiment-assembly-1.0.jar
或尝试重新排列spark-submit中的标志,例如:
../bin/spark-submit --jars /home/ubuntu/spark/spark_examples/target/scala-2.10/twitter-sentiment-assembly-1.0.jar --master local[4] --class Sentimenter
你可以让它发挥作用。
答案 1 :(得分:0)
在我看来,可能是因为spark无法访问你的jar文件。尝试将jar文件移出主目录。这个对我有用。
答案 2 :(得分:0)
在我的情况下,原因是我通过一个EMR步骤将JAR从S3复制到集群中,然后从另一步骤运行spark-submit。这意味着JAR已被复制到/mnt/var/lib/hadoop/steps/<StepId>
的步骤目录中。随后,spark-submit找不到JAR,因为运行spark-submit <some-list-of-ars> jarfile.jar <some-other-args>
会在/mnt/var/lib/hadoop/steps/<DifferentStepID>
中查找JAR文件。
我通过将JAR复制到主目录中,然后使用相对于step目录的标准文件路径调用spark-submit解决了此问题。即../../../../../../home/hadoop/jarfile.jar