Question

我想阅读HDFS数据，但数据可能会保存为saveAsObject[(String,Int,SparseVector)]或saveAsObject[(Int,String,Int)]等。

所以我想通过"String,Int,SparseVector"将spark-submit等命令行参数传递给我的工作。

如何从方法enter code here的命令行参数中获取type saveAsObject[type]？

object Test2 {
  def main(args: Array[String]): Unit = {
  val conf=new org.apache.spark.SparkConf()
  val sc = new org.apache.spark.SparkContext(conf)
  var fmt = "Int,String,SparseVector"
  if(args.size!=0){fmt=args(0)}
  var fmt_arr=fmt.split(",")
  type data_type=(matchClass(fmt_arr(0)),matchClass(fmt_arr(1)),matchClass(fmt_arr(2)))
  val data = sc.objectFile[data_type]("")
}

def matchClass(str:String)={
  str match {
  case "String" =>  String
  case "Int"    =>  Int
  case "SparseVector" => SparseVector
  case _ => throw new RuntimeException("unsupported type")
  }
}
}

Answer 1

所有配置条目，您都可以将它们放在所谓的application.conf文件中！

https://github.com/lightbend/config

然后，当您执行Spark提交时，您可以阅读此配置文件！在这里查看有关如何将application.conf文件加载到应用程序中的一些示例。对于Spark应用程序，该机制也应该相同！

https://github.com/joesan/plant-simulator/blob/master/app/com/inland24/plantsim/config/AppConfig.scala

Scala如何定义objectFile [Type]（路径）

1 个答案: