替换嵌套的json列的值

时间:2019-12-24 22:18:49

标签: apache-spark

我想使用spark数据框替换json文件的嵌套字段值。我想将此更改的值另存为另一个文件,我的意思是整个内容和此更改的值(不仅是更改的值)。 我尝试了几种方法,但是使用withColumn似乎是正确的方法,但无法实现这一点。 我需要专家的帮助。需要更改三个字段feed.ip,ip_location.geo_point.lat,ip_location.geo_point.lon

{
  "device": {
    "browser": "Chrome 62.0",
    "operatingsystemversion": "10"
  },
  "feed": {
    "environment": "prod",
    "ip": "106.223.93.50",
    "useragent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36"
  },
  "ip_location": {
    "country_name": "India",
    "geo_point": {
      "lat": 12.983,
      "lon": 77.583
    },
    "postal_code": "",
    "region_name": "Karnataka"
  },
  "tag": {
    "browser_timestamp": "1512118862384",
    "path": {
      "crid": "833575ed-06d6-466d-b72e-024fabd054cc",
      "truncated_url": "/document"
    },
    "prft": "4477",
    "ttfb": "2360",
    "url": {
      "crid": "833575ed-06d6-466d-b72e-024fabd054cc",
    },
    "urlref": {
      "crid": "e31d9e7f-1425-4809-ba64-131a686449db",
    },
    "user_color_depth": "24",
     "topicparentguid": ""
  }
}

myjson=spark.read.json(local_path)
myjson.select("feed.ip").show()
myjson.withColumn("feed.ip", "YYY")
myjson.select("feed.ip").show()

0 个答案:

没有答案