我在Windows 10 PC上安装了spark spark-2.0.0-bin-hadoop2.7
,我想在R中使用SparkR包。
但是,当我运行以下示例代码时:
library(SparkR)
# Initialize SparkSession
sparkR.session(appName = "SparkR-DataFrame-example")
# Create a simple local data.frame
localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18))
# Convert local data frame to a SparkDataFrame
df <- createDataFrame(localDF)
它引发了异常:
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:/Users/Louagyd/Desktop/EDU%20%202016-2017/Data%20Analysis/spark-warehouse
at org.apache.hadoop.fs.Path.initialize(Path.java:205)
at org.apache.hadoop.fs.Path.<init>(Path.java:171)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeQualifiedPath(SessionCatalog.scala:114)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:145)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89)
at org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95)
at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95) at org.apache.spark.sql.internal.SessionState$$anon$1.<init>(SessionState.scala:112)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112)
at org.apache.spark.sql.internal.SessionState.analyzer(Session
任何想法如何解决?
答案 0 :(得分:0)
我也遇到了同样的错误,但网上没有任何帮助。但是,我通过以下步骤解决了这个问题:
在RStudio的脚本窗口中,按相同的顺序运行以下命令:
# Set Working Dir - The same folder under which R Project was created
setwd("C:/home/Project/SparkR")
# Load Env variable SPARK_HOME, if not already loaded.
# If this variable is already set in Window's Env variable, this step is not required
if (nchar(Sys.getenv("SPARK_HOME")) < 1) {
Sys.setenv(SPARK_HOME = "C:/spark-2.0.0-bin-hadoop2.7")
}
# Load SparkR library
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
# Create a Config variable mapping Memory to be allocated and Warehouse directory to be referred during runtime.
sparkConf = list(spark.driver.memory = "2g", spark.sql.warehouse.dir="C:/tmp")
# Create SparkR Session variable
sparkR.session(master = "local[*]", sparkConfig = sparkConf)
# Load existing data from SparkR library
DF <- as.DataFrame(faithful)
# Inspect loaded data
head(DF)
通过上述步骤,我可以成功加载数据并查看它们。