在本地运行Spark 2.0
df <- data.frame(a = c("$0.00 ", "$601.19 ", "$601.19 ", "$238.58 "),
b = c("$148.81 ", "$396.85", "$24.37 ", "$24.37 "),
c = c("$238.58 ", "$211.15 ", "$422.30 ", "$150.30")
)
ddf <- as.DataFrame(df)
我希望运行类似的东西
ddf2 <- dapply(ddf, function(x) { regexp_replace(x, "\\$|,", "")}, schema(ddf))
但它返回错误
head(ddf2)
ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 13)
org.apache.spark.SparkException: R computation failed with
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘regexp_replace’ for signature ‘"data.frame", "character", "character"’
答案 0 :(得分:1)
使用dapply
:
ddf2 <- dapply(ddf, function(x) { as.data.frame(apply(x, MARGIN=2, function(y) gsub("\\$|,", "", y, perl=TRUE)), stringsAsFactors = FALSE) } , schema(ddf))
dapply
期望R data.frame作为匿名函数的输出。
regexp_replace
方法需要SparkDataFrame Column
作为输入。
没有dapply
的示例(仅替换a
列的值):
withColumn(ddf,'a', regexp_replace(ddf$a, "\\$|,", ""))