我尝试执行水槽流的示例,但不能让我的jar文件工作: 这里 https://github.com/spark-packages/dstream-flume/blob/master/examples/src/main/python/streaming/flume_wordcount.py 他们指出
bin/spark-submit --jars \
external/flume-assembly/target/scala-*/spark-streaming-flume-assembly-*.jar
我不知道这是什么"外部" dir?
在我的spark(1.6.0)lib上我放了几个罐子(我试过1.6.0和1.6.0):
$ pwd
/Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib
$ ls *flume*
spark-streaming-flume-assembly_2.10-1.6.0.jar
spark-streaming-flume-assembly_2.10-1.6.2.jar
spark-streaming-flume-sink_2.10-1.6.2.jar
spark-streaming-flume-sink_2.10-1.6.0.jar
spark-streaming-flume_2.10-1.6.0.jar
spark-streaming-flume_2.10-1.6.2.jar
然后我做了一个:
$ ./bin/pyspark --master ip:7077 --total-executor-cores 1 --packages com.databricks:spark-csv_2.10:1.4.0
--jars /Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/spark-streaming-flume-sink_2.10-1.6.0.jar
--jars /Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/spark-streaming-flume_2.10-1.6.0.jar
--jars /Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/spark-streaming-flume-assembly_2.10-1.6.0.jar
python笔记本服务器启动,但是当我要求风暴对象时:
from pyspark.streaming.flume import FlumeUtils
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.streaming import StreamingContext
try : sc.stop()
except : pass
try : ssc.stop()
except : pass
conf = SparkConf()
conf.setAppName("Streaming Flume")
conf.set("spark.executor.memory","1g")
conf.set("spark.driver.memory","1g")
conf.set("spark.cores.max","5")
conf.set("spark.driver.extraClassPath", "/Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/")
conf.set("spark.executor.extraClassPath", "/Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/")
sc = SparkContext(conf=conf)
ssc = StreamingContext(sc, 10)
FlumeUtils.createStream(ssc, "localhost", "4949")
失败了:
________________________________________________________________________________________________
Spark Streaming's Flume libraries not found in class path. Try one of the following.
1. Include the Flume library and its dependencies with in the
spark-submit command as
$ bin/spark-submit --packages org.apache.spark:spark-streaming-flume:1.6.0 ...
2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
Group Id = org.apache.spark, Artifact Id = spark-streaming-flume-assembly, Version = 1.6.0.
Then, include the jar in the spark-submit command as
$ bin/spark-submit --jars <spark-streaming-flume-assembly.jar> ...
________________________________________________________________________________________________
我试图添加
--packages org.apache.spark:spark-streaming-flume-sink.1.6.0
在我的火花提交结束时,但我得到了另一个问题:
org.apache.spark#spark-streaming-flume-sink added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
:: resolution report :: resolve 2344ms :: artifacts dl 0ms
:: modules in use:
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 0 | 0 | 0 || 0 | 0 |
---------------------------------------------------------------------
:: problems summary ::
:::: WARNINGS
module not found: org.apache.spark#spark-streaming-flume-sink;1.6.0
==== local-m2-cache: tried
file:/Users/romain/.m2/repository/org/apache/spark/spark-streaming-flume-sink/1.6.0/spark-streaming-flume-sink-1.6.0.pom
-- artifact org.apache.spark#spark-streaming-flume-sink;1.6.0!spark-streaming-flume-sink.jar:
file:/Users/romain/.m2/repository/org/apache/spark/spark-streaming-flume-sink/1.6.0/spark-streaming-flume-sink-1.6.0.jar
==== local-ivy-cache: tried
/Users/romain/.ivy2/local/org.apache.spark/spark-streaming-flume-sink/1.6.0/ivys/ivy.xml
==== central: tried
https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-flume-sink/1.6.0/spark-streaming-flume-sink-1.6.0.pom
-- artifact org.apache.spark#spark-streaming-flume-sink;1.6.0!spark-streaming-flume-sink.jar:
https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-flume-sink/1.6.0/spark-streaming-flume-sink-1.6.0.jar
==== spark-packages: tried
http://dl.bintray.com/spark-packages/maven/org/apache/spark/spark-streaming-flume-sink/1.6.0/spark-streaming-flume-sink-1.6.0.pom
-- artifact org.apache.spark#spark-streaming-flume-sink;1.6.0!spark-streaming-flume-sink.jar:
http://dl.bintray.com/spark-packages/maven/org/apache/spark/spark-streaming-flume-sink/1.6.0/spark-streaming-flume-sink-1.6.0.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: org.apache.spark#spark-streaming-flume-sink;1.6.0: not found
::::::::::::::::::::::::::::::::::::::::::::::
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.apache.spark#spark-streaming-flume-sink;1.6.0: not found]
我从未使用过pom.xml - 也许我应该这样做?
答案 0 :(得分:0)
我发现了我的错误:我下载的jar文件是空的!
: - (((((
错误来自=
卷曲-O http://search.maven.org/remotecontent?filepath=org/apache/spark/spark-streaming-flume-sink_2.10/1.6.2/spark-streaming-flume-sink_2.10-1.6.2.jar
只下载一个html文件: - (