我有一个火花数据帧,我想处理每一行的每个句子(更低,删除标点符号)。
更具体一点:
|text |
+---------------------------------
| [this is text ][i want to split]
+---------------------------------
我想得到这个数据帧:
import org.apache.spark.h2o._
import org.apache.spark._
import org.apache.spark.SparkContext._
object App1 extends App{
val conf = new SparkConf()
conf.setAppName("Test")
conf.setMaster("local[1]")
conf.set("spark.executor.memory","1g");
val sc = new SparkContext(conf)
val rawData = sc.textFile("c:\\spark\\data.csv")
val data = rawData.map(line => line.split(',').map(_.toDouble))
val response: RDD[Int] = data.map(row => row(0).toInt)
val h2oResponse: H2OFrame = response // <-- this line throws the error
sc.stop
}