如何访问Spark Scala数据框中的结构中的特定元素?

时间:2018-06-25 17:30:47

标签: scala apache-spark apache-spark-sql scala-collections

数据框中的一列包含

形式的结构

    val devCmd = Command.command("dev"){ state =>
     Project extract state appendWithSession (Seq(buildEnv := BuildEnv.Development), state)
    }

我试图获取该列表中每个元素的结果和单词的值,但是我很难访问列表中的元素。

该列中每个元素的类型是:

[{"annotatorType":"pos","begin":0,"end":0,"result":"NNP","metadata":{"word":"D"}}, 
{"annotatorType":"pos","begin":1,"end":4,"result":"POS","metadata":{"word":"'aww"}}, 
{"annotatorType":"pos","begin":5,"end":5,"result":".","metadata":{"word":"!"}}]

我的代码是:

array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>>>

但是,我收到此错误消息:

case class AnnotatorType(annotatorType: String,
                     begin: Int,
                     end: Int,
                     result: String,
                     metadata: Map[String,String])

def getNouns(rawJson: Seq[AnnotatorType]): Array[String] = {
  var nouns: Array[String] = Array[String]();
  for(i <- 0 to rawJson.length-1) {
    nouns = nouns ++ Array(rawJson(i).result)
  }
  return nouns;
}

val getNounsUDF = udf { s: Seq[AnnotatorType] => getNouns(s)}

testPOS = testPOS.withColumn("subject_nouns", getNounsUDF(testPOS.col("pos")));

display(testPOS)

我认为这是由于无法将数据帧中的数据转换为AnnotatorType格式,但是我不确定如何解决此问题。

0 个答案:

没有答案