Spark ES-Hadoop插件JSON数据

时间:2017-09-24 18:07:20

标签: scala apache-spark elasticsearch elasticsearch-hadoop

    val ordersDF = spark.read.schema(revenue_schema).format("csv").load("s3://xxxx/fifa/pocs/smallMetrics.csv")
    val product_df = spark.read.json("s3://xxxx/fifa/pocs/smallCatalogue.json").toDF("id", "product", "style_id")
    val product_json_df = product_df.select($"style_id",to_json($"product").alias("product"))

    val product_final_df = product_json_df.select($"style_id", get_json_object(($"product"), "$.brand").alias("brand")
      , get_json_object(($"product"), "$.gender").alias("gender")
      , get_json_object(($"product"), "$.article_type").alias("article_type")
      , get_json_object(($"product"), "$.business_unit").alias("business_unit")
      , get_json_object(($"product"), "$.season").alias("season")
      , get_json_object(($"product"), "$.season_code").alias("season_code")
      , get_json_object(($"product"), "$.brand_code").alias("brand_code")
      , get_json_object(($"product"), "$.style_catalogued_date").alias("style_catalogued_date")
      , get_json_object(($"product"), "$.base_colour").alias("base_colour")
      , get_json_object(($"product"), "$.image").alias("image")
      , get_json_object(($"product"), "$.image_array").alias("image_array")
      , get_json_object(($"product"), "$.MRP").alias("mrp")
      , get_json_object(($"product"), "$.attrs").alias("product_attributes")
    )
    product_final_df.show(false)

    |style_id|brand          |gender|article_type|business_unit       |season|season_code|brand_code|style_catalogued_date|base_colour|image|image_array                         |mrp |product_attributes                                                                                                                                                                                                                                                           |
    +--------+---------------+------+------------+--------------------+------+-----------+----------+---------------------+-----------+-----+------------------------------------+----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |2024270 |Marks & Spencer|Women |Jeans       |International Brands|Fall  |FW17       |MKSP      |null                 |Khaki      |null |[null,null,null,null,null,null,null]|2299|{"ALL":"STYLES","Add-Ons":"NA","Brand Fit Name":"NA","Closure":"Button and Zip","Distress":"Clean Look","Fabric":"Cotton","Fade":"No Fade","Features":"NA","Fit":"Super Skinny Fit","Occasion":"Casual","Shade":"Dark","Waist Rise":"Mid-Rise","Waistband":"With belt loops"}|
    |2023709 |Bossini        |Boys  |Tshirts     |Kids Wear           |Fall  |FW17       |BILE      |null                 |NA         |null |[null,null,null,null,null,null,null]|599 |{"ALL":"STYLES","Fabric":"Polyester","Fabric Type":"Single jersey","Fit":"Regular Fit","Multipack Set":"Single","Neck":"Henley Neck","Pattern":"Solid","Pattern Coverage":"NA","Print or Pattern Type":"Solid","Sleeve Length":"Long Sleeves","Surface Styling":"NA"}        |
    |2024333 |Marks & Spencer|Women |Tops        |International Brands|Fall  |FW17       |MKSP      |null                 |null       |null |[null,null,null,null,null,null,null]|1999|{"ALL":"STYLES","Fabric":"Polyester","Neck":"Round Neck","Pattern":"Solid","Print or Pattern Type":"Solid","Sleeve Length":"Short Sleeves","Sleeve Styling":"Flared Sleeves","Surface Styling":"NA","Type":"Regular","Weave Type":"Knitted"}        
val product_metrics_df = ordersDF.join(product_final_df,"style_id")
product_metrics_df.show(false)

+--------+--------+------+-------+--------+----------------+---------------+--------------+----------+-----------------+---------+---------------+------+------------+--------------------+------+-----------+----------+---------------------+-----------+-----+------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|style_id|date    |mrp   |revenue|quantity|product_discount|coupon_discount|total_discount|list_count|add_to_cart_count|pdp_count|brand          |gender|article_type|business_unit       |season|season_code|brand_code|style_catalogued_date|base_colour|image|image_array                         |product_attributes                                                                                                                                                                                                                                                           |
+--------+--------+------+-------+--------+----------------+---------------+--------------+----------+-----------------+---------+---------------+------+------------+--------------------+------+-----------+----------+---------------------+-----------+-----+------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|2024270 |20170101|1000.0|1000.0 |1000    |1000.0          |1000.0         |1000.0        |1000      |2000             |2000     |Marks & Spencer|Women |Jeans       |International Brands|Fall  |FW17       |MKSP      |null                 |Khaki      |null |[null,null,null,null,null,null,null]|{"ALL":"STYLES","Add-Ons":"NA","Brand Fit Name":"NA","Closure":"Button and Zip","Distress":"Clean Look","Fabric":"Cotton","Fade":"No Fade","Features":"NA","Fit":"Super Skinny Fit","Occasion":"Casual","Shade":"Dark","Waist Rise":"Mid-Rise","Waistband":"With belt loops"}|
|2024333 |20170101|1000.0|1000.0 |1000    |1000.0          |1000.0         |1000.0        |1000      |2000             |2000     |Marks & Spencer|Women |Tops        |International Brands|Fall  |FW17       |MKSP      |null                 |null       |null |[null,null,null,null,null,null,null]|{"ALL":"STYLES","Fabric":"Polyester","Neck":"Round Neck","Pattern":"Solid","Print or Pattern Type":"Solid","Sleeve Length":"Short Sleeves","Sleeve Styling":"Flared Sleeves","Surface Styling":"NA","Type":"Regular","Weave Type":"Knitted"}                                 |
|2023709 |20170101|1000.0|1000.0 |1000    |1000.0          |1000.0         |1000.0        |1000      |2000             |2000     |Bossini        |Boys  |Tshirts     |Kids Wear           |Fall  |FW17       |BILE      |null                 |NA         |null |[null,null,null,null,null,null,null]|{"ALL":"STYLES","Fabric":"Polyester","Fabric Type":"Single jersey","Fit":"Regular Fit","Multipack Set":"Single","Neck":"Henley Neck","Pattern":"Solid","Pattern Coverage":"NA","Print or Pattern Type":"Solid","Sleeve Length":"Long Sleeves","Surface Styling":"NA"}        |
+--------+--------+------+-------+--------+----------------+---------------+--------------+----------+-----------------+---------+---------------+------+------------+--------------------+------+-----------+----------+---------------------+-----------+-----+------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


product_metrics_df.saveToEs(elasticConf)

product_attributes列写入ES时,它会被反斜杠和双引号转义,

product_attributes  "{\"ALL\":\"STYLES\",\"Add-Ons\":\"NA\",\"Brand Fit Name\":\"NA\",\"Closure\":\"Button and Zip\",\"Distress\":\"Clean Look\",\"Fabric\":\"Cotton\",\"Fade\":\"No Fade\",\"Features\":\"NA\",\"Fit\":\"Super Skinny Fit\",\"Occasion\":\"Casual\",\"Shade\":\"Dark\",\"Waist Rise\":\"Mid-Rise\",\"Waistband\":\"With belt loops\"}"

有什么方法可以避免json被反斜杠逃脱?因为product_attributes下的哪些键值对没有单独索引,并且因为它不是有效的json,所以ES将其解释为单个String字段

我已经将数据框写入S3以交叉检查product_attributes数据是否被转义,并且json也会使用反斜杠字符进行转义。

product_metrics_df.write.json("s3://xxxxx/fifa/pocs/output.csv")

ES索引模板:https://pastebin.com/e4tmATHE

使用spark和python能够将数据写入ES中,所以ES索引模板很好用吗

我尝试了另一种方法,我使用json4s库构建了json,然后将json写入ES,但这里也遇到了同样的问题

  val json =
    (
        ("style_id" -> row.getInt(0)) ~
        ("date" -> row.getInt(1)) ~
        ("mrp" -> row.getFloat(2)) ~
        ("revenue" -> row.getFloat(3)) ~
        ("quantity" -> row.getInt(4)) ~
        ("product_discount" -> row.getFloat(5)) ~
        ("coupon_discount" -> row.getFloat(6)) ~
        ("total_discount" -> row.getFloat(7)) ~
        ("list_count" -> row.getInt(8)) ~
        ("add_to_cart_count" -> row.getInt(9)) ~
        ("pdp_count" -> row.getInt(10)) ~
          ("brand" -> row.getString(11)) ~
          ("gender" -> row.getString(12)) ~
          ("article_type" -> row.getString(13)) ~
          ("business_unit" -> row.getString(14)) ~
          ("season" -> row.getString(15)) ~
          ("season_code" -> row.getString(16)) ~
          ("brand_code" -> row.getString(17)) ~
          ("style_catalogued_date" -> row.getString(18)) ~
          ("base_colour" -> row.getString(19)) ~
          ("image" -> row.getString(20)) ~
          ("image_array" -> row.getString(21)) ~
          ("product_attributes" -> row.getString(22) )
      )
     compact(render(json)).toString

}

val product_metrics_df = ordersDF.join(product_final_df,"style_id").map(convertRowToJSON)

现在一旦json准备就绪,将es.input.json属性设置为true并尝试,但没有运气

尝试了saveJsonToEs方法,没有运气,json仍然被转义并被视为单个对象

product_metrics_df.rdd.saveJsonToEs(elasticConf)

由于

0 个答案:

没有答案