Question

我对Spark和java编程比较陌生。给定一个带有嵌套对象的json文件，我需要展平它的结构（使内容变性）并使用spark加载到Elastisearch。

例如，

如果我的example.json的内容是：

{
  "title": "Nest eggs",
  "body":  "Making your money work...",
  "tags":  [ "cash", "shares" ],
  "comments": 
    {
      "name":    "John Smith",
      "comment": "Great article",
      "age":     28,
      "stars":   4,
      "date":    "2014-09-01"
    }
  "owner": 
    {
      "name":    "John Smith",
      "age":     28,
    }
}

我想以下面的格式重建它，并使用spark将其加载到ES中。

{
  "title": "Nest eggs",
  "body":  "Making your money work...",
  "tags":  [ "cash", "shares" ],
  "comments_name": "John Smith",
  "comments_comment": "Great article",
  "comments_age":     28
  "comments_stars":   4,
  "comments_date":    "2014-09-01"
  "owner_name": "John Smith",
  "owner_age":     28,
 }

如果其中一个嵌套对象为空，则内容也可以为空。

感谢任何帮助。感谢

Answer 1

您正在寻找的答案是here。

总而言之，您只需通过点表示法选择所需的字段。

val df = sqlcontext.read.json(json)    
val flattened = df.select($"title", $"comments.name")

使用spark并加载到Elasticsearch

1 个答案: