1。我用这个想法将spark程序打包为在集群上运行,但是发生以下错误。
Exception in thread "main" java.io.IOException: No FileSystem for scheme: spark
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
at scala.Option.map(Option.scala:146)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2。我正在使用的命令如下
./bin/spark-submit \
--class com.sparkStreaming.Demo10_HA.DriverHADemo\
--master spark://hadoop01:7077 \
--deploy-mode cluster \
--supervise \
hdfs://mycluster/spark-streaming/submitjars/thirdTest.jar
例如
3.1
解决此问题的方法:java.io.IOException: No FileSystem for scheme: hdfs
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.hadoop.fs.FileSystem</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass></mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
仍然报告相同的错误 3.2使用--jars
./bin/spark-submit \
--class com.sparkStreaming.Demo10_HA.DriverHADemo\
--master spark://hadoop01:7077 \
--deploy-mode cluster \
--jars /usr/local/hadoop-2.7.1/share/hadoop/hdfs/*.jar /usr/local/spark-2.2.0-bin-hadoop2.7/jars/*.jar /usr/local/hadoop-2.7.1/share/hadoop/common/*.jar \
--supervise \
hdfs://mycluster/spark-streaming/submitjars/thirdTest.jar
仍然报告相同的错误
3.3还要修改maven jar包,例如
修改jar包
D:\install\mavenrepository\org\apache\hadoop\hadoop-common\2.7.1\hadoop-common-2.7.1.jar
将以下配置添加到配置文件core-site.xml中,然后重新打包到本地maven存储库中
<!--- global properties -->
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>
<property>
<name>fs.file.impl</name>
<value>org.apache.hadoop.fs.LocalFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>
仍然报告相同的错误
4。以上三种解决方案均基于No FileSystem for scheme: hdfs
被解决的问题,对我而言不起作用。
我认为阅读错误消息应该是hdfs和spark的结合,但是我不知道如何解决。请帮助我,非常感谢。
5。最后,我发布了我的spark代码,但我认为它应该与它无关。
package com.sparkStreaming.Demo10_HA
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.dstream.DStream
/**
* Created by RollerQing on 2019/11/16 14:23
*/
object DriverHADemo {
def main(args: Array[String]): Unit = {
val spark: SparkSession = SparkSession
.builder()
.appName(DriverHADemo.getClass.getSimpleName)
.getOrCreate()
val sc : SparkContext = spark.sparkContext
val ck = "hdfs:qf//spark-streaming/ck_ha"
// val ck = "D:\\installs\\SparkStreamingTest\\data"
val ssc : StreamingContext = StreamingContext.getOrCreate(ck, () => {
val tmpSsc: StreamingContext = new StreamingContext(sc, Seconds(2))
tmpSsc.checkpoint(ck)
val ds: DStream[(String, Int)] = tmpSsc.socketTextStream("hadoop01", 8888, StorageLevel.MEMORY_ONLY)
.flatMap(_.split("\\s+"))
.map((_, 1))
.reduceByKey(_ + _)
ds.print
tmpSsc
})