数据框中的一列包含
形式的结构
val devCmd = Command.command("dev"){ state =>
Project extract state appendWithSession (Seq(buildEnv := BuildEnv.Development), state)
}
我试图获取该列表中每个元素的结果和单词的值,但是我很难访问列表中的元素。
该列中每个元素的类型是:
[{"annotatorType":"pos","begin":0,"end":0,"result":"NNP","metadata":{"word":"D"}},
{"annotatorType":"pos","begin":1,"end":4,"result":"POS","metadata":{"word":"'aww"}},
{"annotatorType":"pos","begin":5,"end":5,"result":".","metadata":{"word":"!"}}]
我的代码是:
array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>>>
但是,我收到此错误消息:
case class AnnotatorType(annotatorType: String,
begin: Int,
end: Int,
result: String,
metadata: Map[String,String])
def getNouns(rawJson: Seq[AnnotatorType]): Array[String] = {
var nouns: Array[String] = Array[String]();
for(i <- 0 to rawJson.length-1) {
nouns = nouns ++ Array(rawJson(i).result)
}
return nouns;
}
val getNounsUDF = udf { s: Seq[AnnotatorType] => getNouns(s)}
testPOS = testPOS.withColumn("subject_nouns", getNounsUDF(testPOS.col("pos")));
display(testPOS)
我认为这是由于无法将数据帧中的数据转换为AnnotatorType格式,但是我不确定如何解决此问题。