Question

我一直收到此错误

线程中的异常＆＃34; main＆＃34; org.apache.hadoop.mapred.InvalidInputException：输入路径没有存在：hdfs：/filename.txt

我已经设置了一个独立的火花群集，我正在尝试在我的主节点上运行此代码。

conf = new SparkConf()
      .setAppName("Recommendation Engine1")
      .set("spark.executor.memory", "1g")
      .set("spark.driver.memory", "4g")

    val sc = new SparkContext(conf)
    val rawUserArtistData = sc.textFile("hdfs:/user_artist_data.txt").sample(false,0.05)

在我的终端上运行，

spark-submit --class com.latentview.spark.Reco --master spark：// MASTERNODE U IP：PORT --deploy-mode client /home/cloudera/workspace/new/Sparksample/target/Sparksample-0.0.1-SNAPSHOT-jar-with-dependencies.jar

这些是我尝试的各种事情，

我用我的core-site.xml文件中存在的fs.defaultFS路径替换了hdfs：/filename.txt
用hdfs：//替换hdfs：/filename.txt（如果它有任何区别的话）
将hdfs：/替换为file：//，然后将其替换为file：///以访问我的本地驱动器以获取文件

这些似乎都没有用，是否有其他可能出错的地方。

如果我做hadoop fs -ls

这是我的文件所在。

Answer 1

通常路径是：

hdfs://name-nodeIP:8020/path/to/file

在你的情况下一定是，

hdfs://localhost:8020/user_artist_data.txt

或

hdfs://machinname:8020/user_artist_data.txt

Answer 2

org.apache.hadoop.mapred.InvalidInputException错误表示Spark无法创建RDD，因为文件夹“ hdfs：/user_artist_data.txt”上没有文件。尝试连接hdfs：// localhost：8020 / user_artist_data.txt，看看是否有文件。

线程＆＃34; main＆＃34;中的例外情况org.apache.hadoop.mapred.InvalidInputException

2 个答案: