Question

我有一个案例类如下：
case class MHealthUser(acc_Chest_X: Double, acc_Chest_Y: Double, acc_Chest_Z: Double, activityLabel: Int)

这些构成了Spark DataFrame的架构，这就是我使用案例类的原因。我只想将这些映射到Array[String]，以便我可以在Spark中使用ParamValidators.inArray(attributes)方法。我使用以下代码使用反射将构造函数参数映射到数组：

val attributes: Array[String] = MHealthUser.getClass.getConstructors.map(a => a.toString)

但这只是给了我一个长度为1的数组，而我想要一个长度为4的数组，数组的内容是我定义的数据集模式，作为一个字符串。否则我正在使用数据集模式的硬编码值，这显然是不优雅的。换句话说，我想要输出：

val attributes: Array[String] = Array("acc_Chest_X", "acc_Chest_Y", "acc_Chest_Z", "activityLabel")

我已经玩了一段时间而无法让它发挥作用。任何想法都赞赏。谢谢！

Answer 1

我使用ScalaReflection：

import org.apache.spark.sql.catalyst.ScalaReflection
import org.apache.spark.sql.types.StructType

ScalaReflection.schemaFor[MHealthUser].dataType match {
  case s: StructType => s.fieldNames
  case _ => Array[String]()
}

外部Spark见Scala. Get field names list from case class

将case类构造函数参数转换为Scala中的String Array

1 个答案: