Question

我正在尝试使用spark和java加载Testfile。代码在客户端模式下（在我的本地计算机上）运行良好，但是在群集模式下（即在服务器上）给出了 FileNotFound 异常。

SparkSession spark = SparkSession
                     .builder()
                     .config("spark.mesos.coarse","true")
                     .config("spark.scheduler.mode","FAIR")
                     .appName("1")
                     .master("local")
                     .getOrCreate();

  spark.sparkContext().addFile("https://mywebsiteurl/TestFile.csv");
  String[] fileServerUrlArray = fileServerUrl.split("/");
  fileName = fileServerUrlArray[fileServerUrlArray.length - 1];
  String file = SparkFiles.get(fileName);
  String modifiedFile="file://"+file;

  spark.read()
       .option("header", "true")
       .load(modifiedFile);   //getting FileNotFoundException in this line

获取FileNotFound异常。

Answer 1

以群集模式运行作业时，spark绝不会在驱动程序的本地区域上写入。如果您可以读取缓冲区中的文件，则最好的选择是BigDecimal distance= new BigDecimal("41320000000000"); BigDecimal speed = new BigDecimal("299792"); BigDecimal travelSpeed = distance.divide(speed, BigDecimal.ROUND_HALF_UP);或使用collect()。请尝试使用以下代码并分享它是否对您有用？

toLocalIterator()

使用SparkContext.addFile加载文件，并使用load或csv方法加载文件

1 个答案: