我需要将黄瓜数据表转换为数据帧以进行测试。我指的是github项目,该项目具有示例转换:https://github.com/mdcurran/simple-spark-testing
我的小黄瓜脚本:
鉴于我将以下数据转换为数据帧:
| STUDENT_ID:Int | NAME:String | REMARKS:String |
| 1 | Sam | Pass |
| 2 | Nick | Pass |
我的步骤定义调用了执行转换的辅助方法:
Given("""^I convert the following data to data frame$"""){ (data: DataTable) =>
TestUtils.test(data)
}
Helper方法:
object TestUtils extends SparkSessionTestWrapper {
def test(data: DataTable): Unit = {
val header = extractColumns(data)
val schema = generateSchema(header)
val rows = extractRows(data, schema)
val baseTableDF = spark.sqlContext.createDataFrame(spark.sparkContext.parallelize(rows), schema)
}
// Function1: This method works fine and splits the header into column names and data types:
def extractColumns(data: DataTable): List[(String, DataType)] = {
val header: util.List[String] = data.row(0)
header.map(_.split(":"))
.map(splits => (splits(0), splits(1).trim.toLowerCase))
.map {
case (name, "string") => (name, DataTypes.StringType)
case (name, "int") => (name, DataTypes.IntegerType)
case (name, _) => throw new IllegalArgumentException(s"$name invalid - " + s"provide a valid data type: String | Int")
}.toList
}
// Function2: This second function creates a structType successfully, with the list of columns from extractColumns function
def generateSchema(columns: List[(String, DataType)]): StructType = {
StructType(columns.map {
case (name, dataType) => StructField(name, dataType)
})
}
// Function3: This function uses the schema from the above function and the cucumber data table to generate a List. The error occurs in this function "missing parameter type"
def extractRows(data: DataTable, schema: StructType): List[Row] = {
data.asMaps(classOf[String], classOf[String])
.map { row =>
val values = row
.values()
.zip(extractColumns(data))
.map { case (v, (_, dt)) => (v, dt) }
.map {
case (v, DataTypes.StringType) => v
case (v, DataTypes.IntegerType) => v.toInt
}.toSeq
Row.fromSeq(values)
}.toList
}
}
问题出在第三个函数 extractRows 中(错误:(63,14)缺少参数类型.map {row =>))
有人可以帮助我了解缺少的内容吗?
预先感谢