我有以下代码:
setwd("C:\\Users\\Anonymous\\Desktop\\Data 2014")
Sys.setenv(SPARK_HOME = "C:\\Users\\Anonymous\\Desktop\\Spark-1.4.1\\spark-1.6.0-bin-hadoop2.6\\spark-1.6.0-bin-hadoop2.6")
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.3.0" "sparkr-shell"')
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
library(magrittr)
sc <- sparkR.init(master = "local")
sqlContext <- sparkRSQL.init(sc)
当我运行以下内容时:
data <- read.df(sqlContext, "Test.csv", "com.databricks.spark.csv", header="true")
我收到以下错误:
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.NullPointerException
Test.csv
只是一个 3 x 2
表。
答案 0 :(得分:0)
您将在以下链接中获得更多详细信息和错误原因。 https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/javaionotserializableexception.html
如果文件不在您当前的工作目录中,您必须提供csv文件的完整路径。我没有太多想法。下面我贴了代码。它对我来说很好,你可以试试。
Sys.setenv(SPARK_HOME='/home/jayashree/spark-1.5.0') # the path of spark_home dir. Please change it according to your spark home path
.libPaths(c(file.path(Sys.getenv('SPARK_HOME'), 'R', 'lib'), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local", sparkPackages="com.databricks:spark-csv_2.11:1.2.0")
sqlContext <- sparkRSQL.init(sc)
data <- read.df(sqlContext, "/full_path/to_your/datafile.csv", "com.databricks.spark.csv", header="true")
你在Windows上工作吗?