Spark 2.1:将黄瓜数据表转换为数据帧时缺少参数类型错误

时间:2020-04-06 17:04:03

标签: apache-spark cucumber

我需要将黄瓜数据表转换为数据帧以进行测试。我指的是github项目,该项目具有示例转换:https://github.com/mdcurran/simple-spark-testing

我的小黄瓜脚本:

鉴于我将以下数据转换为数据帧:

| STUDENT_ID:Int | NAME:String | REMARKS:String |

| 1 | Sam | Pass |

| 2 | Nick | Pass |

我的步骤定义调用了执行转换的辅助方法:

Given("""^I convert the following data to data frame$"""){ (data: DataTable) =>

    TestUtils.test(data)

}

Helper方法:

object TestUtils extends SparkSessionTestWrapper {

def test(data: DataTable): Unit = {
   val header = extractColumns(data)
   val schema = generateSchema(header)
   val rows = extractRows(data, schema)
   val baseTableDF = spark.sqlContext.createDataFrame(spark.sparkContext.parallelize(rows), schema)
}

//  Function1: This method works fine and splits the header into column names and data types:

  def extractColumns(data: DataTable): List[(String, DataType)] = {
    val header: util.List[String] = data.row(0)
    header.map(_.split(":"))
      .map(splits => (splits(0), splits(1).trim.toLowerCase))
      .map {
        case (name, "string") => (name, DataTypes.StringType)
        case (name, "int") => (name, DataTypes.IntegerType)
        case (name, _) => throw new IllegalArgumentException(s"$name invalid - " + s"provide a valid data type: String | Int")
      }.toList
  }

//  Function2: This second function creates a structType successfully, with the list of columns from extractColumns function
  def generateSchema(columns: List[(String, DataType)]): StructType = {
    StructType(columns.map {
      case (name, dataType) => StructField(name, dataType)
    })
  }

//  Function3: This function uses the schema from the above function and the cucumber data table to generate a List. The error occurs in this function "missing parameter type"

  def extractRows(data: DataTable, schema: StructType): List[Row] = {
    data.asMaps(classOf[String], classOf[String])
      .map { row =>
        val values = row
          .values()
          .zip(extractColumns(data))
          .map { case (v, (_, dt)) => (v, dt) }
          .map {
            case (v, DataTypes.StringType) => v
            case (v, DataTypes.IntegerType) => v.toInt
          }.toSeq
        Row.fromSeq(values)
      }.toList
  }

}

问题出在第三个函数 extractRows 中(错误:(63,14)缺少参数类型.map {row =>))

有人可以帮助我了解缺少的内容吗?

预先感谢

0 个答案:

没有答案
相关问题