Question

我有一个我想要应用于.csv文件的每一行的函数：

def convert(inString: Array[String]) : String = {

    val country  = inString(0)
    val sellerId = inString(1)
    val itemID   = inString(2)
    try{
     val minidf = sqlContext.read.json( sc.makeRDD(inString(3):: Nil) )
        .withColumn("country", lit(country))
        .withColumn("seller_id", lit(sellerId))
        .withColumn("item_id", lit(itemID))
         val finalString = minidf.toJSON.collect().mkString(",")
        finalString
    } catch{
         case e: Exception =>println("AN EXCEPTION "+inString.mkString(","))
         ("this is an exception "+e+"  "+inString.mkString(","))
    }
}

此函数转换排序条目：

CA      112578240       132080411845    [{"id":"general_spam_policy","severity":"critical","timestamp":"2017-02-26T08:30:16Z"}]

我有4列，第4列是json blob，进入

[{"country":"CA", "seller":112578240", "product":112578240, "id":"general_spam_policy","severity":"critical","timestamp":"2017-02-26T08:30:16Z"}]

这是json对象，其中前3列已插入第4列。

现在，这有效：

val conv_string = sc.textFile(path_to_file).map(_.split('\t')).collect().map(x => convert(x))

或者这个：

val conv_string = sc.textFile(path_to_file).map(_.split('\t')).take(10).map(x => convert(x))

但这确实不

val conv_string = sc.textFile(path_to_file).map(_.split('\t')).map(x => convert(x))

最后一个抛出java.lang.NullPointerException。

我添加了try catch子句，因此请查看此失败的确切位置以及每一行都失败。

我在这里做错了什么？

Answer 1

您不能将sqlContext或sparkContext放在Spark地图中，因为该对象只能存在于驱动程序节点上。基本上他们负责分配你的任务。

您可以使用纯scala中的其中一个库来重写JSON解析位：https://manuel.bernhardt.io/2015/11/06/a-quick-tour-of-json-libraries-in-scala/

NullPointerException应用函数来激发适用于非RDD的RDD

1 个答案: