将本地文件读取到在Docker容器上运行的Zeppelin中的spark数据帧中

时间:2018-11-16 00:21:45

标签: docker apache-spark apache-zeppelin

我正在尝试使用笔记本电脑上的Apache zeppelin docker image用Zeppelin编写Spark代码。除了从本地磁盘读取文件外,其他一切都按预期工作当我尝试将csv文件读取到Spark数据帧中

import java.util.Scanner; public class CarMaintenance { public static void main(String[] args) { Scanner keyboard = new Scanner(System.in); String car; //capture car System.out.println("What is the make of your car"); car = keyboard.nextLine(); String answer; boolean yn; System.out.println("Is your car an import?"); while (true) { answer = keyboard.nextLine(); if (answer.equalsIgnoreCase("yes")) { yn = true; break; } else if (answer.equalsIgnoreCase("no")) { yn = false; break; } else { System.out.println("Sorry, I didn't catch that. Please answer yes or no"); } } String[] services = {"Oil Change", "Coolant Flush", "Brake Job", "Tune Up"}; for(int i=0; i<services.length; i++) { System.out.println("Do you want a " +services[i]); answer = keyboard.next(); } double amount = 0; double[] price = {39.99, 59.99, 119.99, 109.99}; for(int i =0; i<price.length; i++) { amount = (price[i] + price [i]); } // double total = 0; double c = 0; c = car(amount); // c = car(total); System.out.println("The price for your services for your" + " " + car + " " + "is" + " "+ c + "."); } public static double car(double amount) { return (double) ((amount-32)* 5/9); } }

我收到以下错误:

val df = spark.read.csv("/User/myname/documents/data/xyz.csv")

1 个答案:

答案 0 :(得分:0)

我想我找到了答案: 我拉了docker映像(我使用了下面的映像,但是您可以更改它)

docker pull skymindops/zeppelin-dl4j

然后运行:

docker run -it --rm -p 7077:7077 -p 8080:8080 --privileged=true -v $PWD/logs:/logs -v $PWD/notebook:/notebook -v $PWD/data:/data \
-e ZEPPELIN_NOTEBOOK_DIR='/notebook' \
-e ZEPPELIN_LOG_DIR='/logs' \
skymindops/zeppelin-dl4j:latest

现在可以从数据文件夹读取文件了:

val df = spark.read.option("header", "true").csv("/data/xyz.csv")

请注意,我不需要该图像中的笔记本。