我想开始使用sparkR跟随教程,但我收到以下错误:
library(SparkR)
Sys.setenv(SPARK_HOME="/Users/myuserhone/dev/spark-2.2.0-bin-hadoop2.7")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","lib"), .libPaths()))
spark <- sparkR.session(appName = "mysparkr", Sys.getenv("SPARK_HOME"), master = "local[*]")
csvPath <- "file:///Users/myuserhome/dev/spark-data/donation"
mySparkDF <- read.df(csvPath, "csv", header = "true", inferSchema = "true", na.strings = "?")
mySparkDF.show()
但我明白了:
Error in mySparkDF.show() : could not find function "mySparkDF.show"
不确定我做错了什么,此外,我没有像read.df(...)
另外,如果我尝试
show(describe(mySparkDF))
或
show(summary(mySparkDF))
我在结果中得到元数据,而不是“描述”预期结果
SparkDataFrame[summary:string, id_1:string, id_2:string, cmp_fname_c1:string, cmp_fname_c2:string, cmp_lname_c1:string, cmp_lname_c2:string, cmp_sex:string, cmp_bd:string, cmp_bm:string, cmp_by:string, cmp_plz:string]
我做错了什么?
答案 0 :(得分:1)
show
在SparkR中没有以这种方式使用,它在PySpark中与same-name命令的作用不同;您应该使用head
或showDF
:
df <- as.DataFrame(faithful)
show(df)
# result:
SparkDataFrame[eruptions:double, waiting:double]
head(df)
# result:
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
showDF(df)
# result:
+---------+-------+
|eruptions|waiting|
+---------+-------+
| 3.6| 79.0|
| 1.8| 54.0|
| 3.333| 74.0|
| 2.283| 62.0|
| 4.533| 85.0|
| 2.883| 55.0|
| 4.7| 88.0|
| 3.6| 85.0|
| 1.95| 51.0|
| 4.35| 85.0|
| 1.833| 54.0|
| 3.917| 84.0|
| 4.2| 78.0|
| 1.75| 47.0|
| 4.7| 83.0|
| 2.167| 52.0|
| 1.75| 62.0|
| 4.8| 84.0|
| 1.6| 52.0|
| 4.25| 79.0|
+---------+-------+
only showing top 20 rows