从spark scala中的dataframe创建地图

时间:2017-09-06 12:20:31

标签: scala apache-spark

我在数据框中有一个json字符串,如下所示

 {100,[{xxxx:{123,yyy}},{yyyy:{345,zzz}}],2017}
 {200,[{rrrr:{500,qqq}},{iiii:{500,ooo}}],2017}
 {300,[{uuuu:{200,ttt}}],2017}

我希望得到结果

XYTileSource tileSource = 
new XYTileSource("Name of Your map", MIN_ZOOM_LEVEL,   MAX_ZOOM_LEVEL,
            256, ".png", new String[]{your server urls});
map.setTileSource(tilesSource);

请帮助

2 个答案:

答案 0 :(得分:2)

这有效:

 val df = data
    .withColumn("cd", array('ccc, 'ddd)) // create arrays of c and d
    .withColumn("valuesMap", map('bbb, 'cd)) // create mapping
    .withColumn("values", collect_list('valuesMap) // collect mappings
                 .over(Window.partitionBy('aaa)))
    .withColumn("eee", first('eee) // e is constant, just get first value of Window
                 .over(Window.partitionBy('aaa)))
   .select("aaa", "values", "eee") // select only columns that are in the question selected
   .select(to_json(struct("aaa", "values", "eee")).as("value")) // create JSON

确保您已导入org.apache.spark.sql.functions._org.apache.spark.sql.expression._

答案 1 :(得分:0)

您可以创建一个映射,使用 lit() 将值定义为常量,或者使用 $"col_name" 从数据框中的其他列中获取它们,如下所示:

val new_df = df.withColumn("map_feature", map(lit("key1"), lit("value1"), lit("key2"), $"col2"))