Spark scala - 嵌套的StructType转换为Map

时间:2017-10-04 13:37:34

标签: scala apache-spark elasticsearch spark-dataframe

我在scala中使用Spark 1.6。

我在ElasticSearch中用一个对象创建了一个索引。对象“params”创建为Map [String,Map [String,String]]。示例:

val params : Map[String, Map[String, String]] = ("p1" -> ("p1_detail" -> "table1"), "p2" -> (("p2_detail" -> "table2"), ("p2_filter" -> "filter2")), "p3" -> ("p3_detail" -> "table3"))

这给我的记录如下所示:

{
        "_index": "x",
        "_type": "1",
        "_id": "xxxxxxxxxxxx",
        "_score": 1,
        "_timestamp": 1506537199650,
        "_source": {
           "a": "toto",
           "b": "tata",
           "c": "description",
           "params": {
              "p1": {
                 "p1_detail": "table1"
              },
              "p2": {
                 "p2_detail": "table2",
                 "p2_filter": "filter2"
              },
              "p3": {
                 "p3_detail": "table3"
              }
           }
        }
     },

然后我尝试读取Elasticsearch索引以更新值。

Spark使用以下模式读取索引:

|-- a: string (nullable = true)
|-- b: string (nullable = true)
|-- c: string (nullable = true)
|-- params: struct (nullable = true)
|    |-- p1: struct (nullable = true)
|    |    |-- p1_detail: string (nullable = true)
|    |-- p2: struct (nullable = true)
|    |    |-- p2_detail: string (nullable = true)
|    |    |-- p2_filter: string (nullable = true)
|    |-- p3: struct (nullable = true)
|    |    |-- p3_detail: string (nullable = true)

我的问题是该对象被读作结构。为了管理和轻松更新我希望拥有Map的字段,因为我对StructType不是很熟悉。

我试图将UDF中的对象作为Map获取但是我有以下错误:

 User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(params)' due to data type mismatch: argument 1 requires map<string,map<string,string>> type, however, 'params' is of struct<p1:struct<p1_detail:string>,p2:struct<p2_detail:string,p2_filter:string>,p3:struct<p3_detail:string>> type.;

UDF代码段:

val getSubField : Map[String, Map[String, String]] => String = (params : Map[String, Map[String, String]]) => { val return_string = (params ("p1") getOrElse("p1_detail", null.asInstanceOf[String]) return_string }

我的问题:我们如何将此结构转换为地图?我已经阅读了文档中提供的toMap方法但无法找到如何使用它(不太熟悉隐式参数),因为我是scala初学者。

提前致谢,

2 个答案:

答案 0 :(得分:1)

我终于解决了以下问题:

for /F "tokens=*" %a in ('wmic datafile where "Name=C:\\Program Files (x86)\\App\\name.exe" get version') do set pver=%a

答案 1 :(得分:0)

您不能将param类型指定为StructType对象,而是将type指定为Row。

//Schema of parameter
def schema:StructType = (new StructType).add("p1", (new StructType).add("p1_detail", StringType))
      .add("p2", (new StructType).add("p2_detail", StringType).add("p2_filter",StringType))
      .add("p3", (new StructType).add("p3_detail", StringType))

 //Not allowed
 val extractVal: schema => collection.Map[Nothing, Nothing] = _.getMap(0)

<强>解决方案:

// UDF example to process struct column
val extractVal: (Row) => collection.Map[Nothing, Nothing] = _.getMap(0)

// You would implement something similar
   val getSubField : Map[String, Map[String, String]] => String =
  (params : Row) =>
  {
    val p1 = params.getAs[Row]("p1")
    .........
    return null;
  }

我希望这有帮助!