我的RDD的每一行看起来都像这样:
[{"date":1.533204038E12,"time":1.533204038E12,"num":"KD10617029","type":"item","vat":0}]
我的功能:
def writeToES(data: java.util.List[String]): Unit = {
val conf: SparkConf = new SparkConf().setAppName("ESWriter").setMaster("local")
val sc: SparkContext = new SparkContext(conf)
val sql: SQLContext = new SQLContext(sc)
val spark: SparkSession = sql.sparkSession
sc.setLogLevel("ERROR")
import spark.implicits._
val dataList = data.toArray()
//println("datalist size: "+dataList.size)
val dataDF = sc.parallelize(dataList)
.map(x=>x.toString)
.map(x=>x.split(","))
.map(x=>Row.fromSeq(x))
.map(x=>x.mkString(",")).toDF()
dataDF.show()
dataDF.take(1).toList.foreach(println)
println(dataDF.take(1).length)
}
如何从列表中的字符串化json获取“键” ... 以及如何在rdd(或数据框)中以行的形式获取每个json的值
答案 0 :(得分:1)
按照@ user238607的建议,您可以直接转换字符串。但是您也可以直接使用中间RDD(带有json字符串):
LocalTime
这将从中间RDD创建一个DataFrame。
$oldFile = Import-Excel ".\personnel_28_11_2018---small2.xlsx"
$newFile = Import-Excel ".\personnel_16_12_2018---small2.xlsx"
$properties = "TRIAL_PK", "TRIALCOUNTRY_PK", "TRIALSSITE_PK", "ASSIGNMENT_LVL", "ROLE", "INT_EXT", "START_DATA", "END_DATE", "PERSONNELL_PK", "TITLE", "LAST_NAME", "FIRST_NAME", "ORGANIZATION_NAME"
$result = Compare-Object -ReferenceObject $oldFile -DifferenceObject $newFile -Property $properties -PassThru -CaseSensitive | Where-Object {$_.SideIndicator -eq "=>"}
$result | Select-Object $properties | Export-Excel ".\changed.xlsx"
对于Spark> = 2.2.0,对于json()函数,请使用数据集,而不是RDD。