Question

我有以下在REPL中运行良好的函数，基本上它正在做的是检查模式的数据类型，并在稍后将文件展平并与zipWithIndex匹配时将其与列匹配：

 //Match a Schema to a Column value
    def schemaMatch(x: Array[String]) = {
      var accum = 0
      for(i <- 0 until x.length) {
        val convert = x(i).toString.toUpperCase
        println(convert)
        val split = convert.split(' ')
        println(split.mkString(" "))
        matchTest(split(1), accum)
        accum += 1
      }
      def matchTest(y:String, z:Int) = y match{

        case "STRING" => strBuf += z
        case "INTEGER" => decimalBuf += z
        case "DECIMAL" => decimalBuf += z
        case "DATE" => dateBuf += z
      }
    }


    schemaMatch(schema1)

我得到的错误：

Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.IntRef.create(I)Lscala/runtime/IntRef;
    at com.capitalone.DTS.dataProfiling$.schemaMatch$1(dataProfiling.scala:112)
    at com.capitalone.DTS.dataProfiling$.main(dataProfiling.scala:131)
    at com.capitalone.DTS.dataProfiling.main(dataProfiling.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

第112行：

var accum = 0

任何想法为什么它在编译后不再工作但在REPL中工作，以及如何纠正它？

Answer 1

使用您提供给我们的代码很难看到导致NoSuchMethodError的潜在问题，但您可以简化提取列类型和相应索引的方法：

def schemaMatch(schema: Array[String]) : Map[String,List[Int]] =
  schema
    // get the 2nd word (column type) in upper cases
    .map(columnDescr => columnDescr.split(' ')(1).toUpperCase)
    // combine column type with index
    .zipWithIndex
    // group by column type
    .groupBy{ case (colType, index) => colType }
    // keep only the indices
    .mapValues( columsIndices => columsIndices.map(_._2).toList )

可以用作：

val columns = Array("x string", "1 integer", "2 decimal", "2015 date") 
val columnTypeMap = schemaMatch(columns)
//Map(DATE -> List(3), STRING -> List(0), DECIMAL -> List(2), INTEGER -> List(1))

val strIndices = columnTypeMap.getOrElse("STRING", Nil)
// List(0)

val decimalIndices = columnTypeMap.getOrElse("INTEGER", Nil) :::
  columnTypeMap.getOrElse("DECIMAL", Nil)
// List(1, 2)

循环中的Scala计数器在运行时导致问题

1 个答案: