我正在使用Spark 1.5.1
当我这样做时
df <- createDataFrame(sqlContext, iris)
#creating a new column for category "Setosa"
df$Species1<-ifelse((df)[[5]]=="setosa",1,0)
head(df)
输出:新列创建
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
但是当我将虹膜数据集保存为CSV文件并尝试读取并将其转换为sparkR数据帧时
df <- read.df(sqlContext,"/Users/devesh/Github/deveshgit2/bdaml/data/iris/",
source = "com.databricks.spark.csv",header = "true",inferSchema = "true")
现在当我尝试创建新列
时df$Species1<-ifelse((df)[[5]]=="setosa",1,0)
我收到以下错误:
16/02/05 12:11:01 ERROR RBackendHandler: col on 922 failed Error in select(x, x$"*", alias(col, colName)) :
error in evaluating the argument 'col' in selecting a method for function 'select': Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
org.apache.spark.sql.AnalysisException: Cannot resolve column name "Sepal.Length" among (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species);
at org.apache.spark.s
答案 0 :(得分:0)
SparkSQL不支持嵌入点的名称。当您使用createDataFrame
名称时会自动为您调整,对于其他方法,您必须明确提供架构:
schema <- structType(
structField("Sepal_Length", "double"),
structField("Sepal_Width", "double"),
structField("Petal_Length", "double"),
structField("Petal_Width", "double"),
structField("Species", "string"))
df <- read.df(sqlContext, path, source = "com.databricks.spark.csv",
header="true", schema=schema)