Question

1。我用这个想法将spark程序打包为在集群上运行，但是发生以下错误。

Exception in thread "main" java.io.IOException: No FileSystem for scheme: spark
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
    at org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865)
    at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
    at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316)
    at scala.Option.map(Option.scala:146)
    at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

2。我正在使用的命令如下

./bin/spark-submit \
--class com.sparkStreaming.Demo10_HA.DriverHADemo\
--master spark://hadoop01:7077 \
--deploy-mode cluster \
--supervise \
hdfs://mycluster/spark-streaming/submitjars/thirdTest.jar

我已经查询了很多解决方案，但这没用。

例如 3.1 解决此问题的方法：java.io.IOException: No FileSystem for scheme: hdfs

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>2.4.3</version>
    <executions>
        <execution>
            <phase>package</phase>
                <goals>
                    <goal>shade</goal>
                </goals>
                <configuration>
                 <filters>
                    <filter>
                        <artifact>*:*</artifact>
                        <excludes>
                            <exclude>META-INF/*.SF</exclude>
                            <exclude>META-INF/*.DSA</exclude>
                            <exclude>META-INF/*.RSA</exclude>
                        </excludes>
                    </filter>
                </filters>
                <transformers> 
                        <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                        <resource>META-INF/services/org.apache.hadoop.fs.FileSystem</resource>
                    </transformer>

                    <transformer
                        implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                        <mainClass></mainClass>
                    </transformer>
                </transformers>
            </configuration>
        </execution>
    </executions>
</plugin>

仍然报告相同的错误 3.2使用--jars

./bin/spark-submit \
--class com.sparkStreaming.Demo10_HA.DriverHADemo\
--master spark://hadoop01:7077 \
--deploy-mode cluster \
--jars /usr/local/hadoop-2.7.1/share/hadoop/hdfs/*.jar /usr/local/spark-2.2.0-bin-hadoop2.7/jars/*.jar /usr/local/hadoop-2.7.1/share/hadoop/common/*.jar \
--supervise \
hdfs://mycluster/spark-streaming/submitjars/thirdTest.jar

仍然报告相同的错误 3.3还要修改maven jar包，例如
修改jar包 D:\install\mavenrepository\org\apache\hadoop\hadoop-common\2.7.1\hadoop-common-2.7.1.jar 将以下配置添加到配置文件core-site.xml中，然后重新打包到本地maven存储库中

<!--- global properties -->
<property>
        <name>fs.hdfs.impl</name>
        <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
        <description>The FileSystem for hdfs: uris.</description>
</property>
<property>
        <name>fs.file.impl</name>
        <value>org.apache.hadoop.fs.LocalFileSystem</value>
        <description>The FileSystem for hdfs: uris.</description>
</property>

仍然报告相同的错误

4。以上三种解决方案均基于No FileSystem for scheme: hdfs被解决的问题，对我而言不起作用。

我认为阅读错误消息应该是hdfs和spark的结合，但是我不知道如何解决。请帮助我，非常感谢。

5。最后，我发布了我的spark代码，但我认为它应该与它无关。

package com.sparkStreaming.Demo10_HA

import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.dstream.DStream

/**
 * Created by RollerQing on 2019/11/16 14:23
 */
object DriverHADemo {
  def main(args: Array[String]): Unit = {
    val spark: SparkSession = SparkSession
      .builder()
      .appName(DriverHADemo.getClass.getSimpleName)
      .getOrCreate()
    val sc : SparkContext = spark.sparkContext


    val ck = "hdfs:qf//spark-streaming/ck_ha"
//    val ck = "D:\\installs\\SparkStreamingTest\\data"

    val ssc : StreamingContext = StreamingContext.getOrCreate(ck, () => {
      val tmpSsc: StreamingContext = new StreamingContext(sc, Seconds(2))

      tmpSsc.checkpoint(ck)

      val ds: DStream[(String, Int)] = tmpSsc.socketTextStream("hadoop01", 8888, StorageLevel.MEMORY_ONLY)
        .flatMap(_.split("\\s+"))
        .map((_, 1))
        .reduceByKey(_ + _)
      ds.print

      tmpSsc
    })

java.io.IOException：方案的无文件系统：spark

0 个答案: