我如何解析scala spark 2.0中的json文件,我可以将数据插入到hive表中吗?

时间:2017-11-16 21:44:18

标签: json scala hadoop apache-spark data-integration

我想在spark 2.0(scala)中解析json文件。接下来我想在Hive表中保存数据.. 我如何使用scala解析json文件? json文件示例)metadata.json:

  {
        "syslog": {
            "month": "Sep",
            "day": "26",
            "time": "23:03:44",
            "host": "cdpcapital.onmicrosoft.com"
        },
        "prefix": {
            "cef_version": "CEF:0",
            "device_vendor": "Microsoft",
            "device_product": "SharePoint Online",
        },
        "extensions": {
            "eventId": "7808891",
            "msg": "ManagedSyncClientAllowed",
            "art": "1506467022378",
            "cat": "SharePoint",
            "act": "ManagedSyncClientAllowed",
            "rt": "1506466717000",
            "requestClientApplication": "Microsoft SkyDriveSync",
            "cs1": "0bdbe027-8f50-4ec3-843f-e27c41a63957",
            "cs1Label": "Organization ID",
            "cs2Label": "Modified Properties",
            "ahost": "cdpdiclog101.cgimss.com",
            "agentZoneURI": "/All Zones",
            "amac": "F0-1F-AF-DA-8F-1B",
            "av": "7.6.0.8009.0",
        }
    },

感谢

1 个答案:

答案 0 :(得分:0)

您可以使用以下内容:

val jsonDf = sparkSession
      .read
      //.option("wholeFile", true) if its not a Single Line JSON
      .json("resources/json/metadata.json")

    jsonDf.printSchema()

jsonDf.registerTempTable("metadata")

有关此https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

的更多详细信息