我正在寻找有关标题问题的建议。我已经读入数据块(https://docs.databricks.com/spark/latest/data-sources/read-json.html),可以将具有以下表达式的多行json读入数据框:
println("2.2 Dataframe Multiline")
MULTILINE MODE!!
val df2=spark.read.option("multiline","true").option("charset","UTF-8").json("EXPORT1.json")
df2.printSchema()
这对我不起作用。如果我从JSON中手动删除所有换行符,则将得到以下结果:
root
|-- results: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- address_components: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- long_name: string (nullable = true)
| | | | |-- short_name: string (nullable = true)
| | | | |-- types: array (nullable = true)
| | | | | |-- element: string (containsNull = true)
| | |-- formatted_address: string (nullable = true)
| | |-- geometry: struct (nullable = true)
| | | |-- bounds: struct (nullable = true)
| | | | |-- northeast: struct (nullable = true)
| | | | | |-- lat: double (nullable = true)
| | | | | |-- lng: double (nullable = true)
| | | | |-- southwest: struct (nullable = true)
| | | | | |-- lat: double (nullable = true)
| | | | | |-- lng: double (nullable = true)
| | | |-- location: struct (nullable = true)
| | | | |-- lat: double (nullable = true)
| | | | |-- lng: double (nullable = true)
| | | |-- location_type: string (nullable = true)
| | | |-- viewport: struct (nullable = true)
| | | | |-- northeast: struct (nullable = true)
| | | | | |-- lat: double (nullable = true)
| | | | | |-- lng: double (nullable = true)
| | | | |-- southwest: struct (nullable = true)
| | | | | |-- lat: double (nullable = true)
| | | | | |-- lng: double (nullable = true)
| | |-- place_id: string (nullable = true)
| | |-- types: array (nullable = true)
| | | |-- element: string (containsNull = true)
|-- status: string (nullable = true)+
这是我从Google下载的JSON示例:
{
"results" : [
{
"address_components" : [
{
"long_name" : "30152",
"short_name" : "30152",
"types" : [ "postal_code" ]
},
{
"long_name" : "Murcia",
"short_name" : "Murcia",
"types" : [ "locality", "political" ]
},
{
"long_name" : "Murcia",
"short_name" : "MU",
"types" : [ "administrative_area_level_2", "political" ]
},
{
"long_name" : "Region of Murcia",
"short_name" : "Region of Murcia",
"types" : [ "administrative_area_level_1", "political" ]
},
{
"long_name" : "Spain",
"short_name" : "ES",
"types" : [ "country", "political" ]
}
],
"formatted_address" : "30152 Murcia, Spain",
"geometry" : {
"bounds" : {
"northeast" : {
"lat" : 37.9659196,
"lng" : -1.1346723
},
"southwest" : {
"lat" : 37.9442828,
"lng" : -1.1687921
}
},
"location" : {
"lat" : 37.9569734,
"lng" : -1.1496969
},
"location_type" : "APPROXIMATE",
"viewport" : {
"northeast" : {
"lat" : 37.9659196,
"lng" : -1.1346723
},
"southwest" : {
"lat" : 37.9442828,
"lng" : -1.1687921
}
}
},
"place_id" : "ChIJZbDcb0Z_Yw0RUK0TPnKvAhw",
"types" : [ "postal_code" ]
}
],
"status" : "OK"
}
由于我想向Google提交许多请愿书,因此我无法手动删除细分线。
有人可以帮助我吗?预先感谢。
答案 0 :(得分:0)
为了解决该问题,我所做的就是存储JSON并删除所有换行符:
以下类接收地址,组件,...,并将Geolocation请求写入JSON
class Geolocation(var Address: String, var Component: String, var APIKey: String, var JSONName:Int ){
val GeoLocURL_REQ="https://maps.googleapis.com/maps/api/geocode/json?address="+Address+"&components="+Component+"&key="+APIKey
val filename=JSONName.toString+"_LatLon.json"
val file = new File(filename)
val bw = new BufferedWriter(new FileWriter(file))
val svc = url(GeoLocURL_REQ)
val response : Future[String] = Http(svc OK as.String)
response onComplete {
case Success(content) => {
println("worked!" + content)
bw.write(content.replaceAll("\\s", "")) //con un \\n va
//bw.write(content)
bw.close()
}
case Failure(t) => {
println("failed:! " + t.getMessage)
}
}
}
import dispatch._, Defaults._
var APIKey="TYPE YOUR OWN API HERE"
var PostalCode=30152
var Localidad = "Murcia"
val Component="postal_code="+PostalCode+"%7Ccountry=ES" // "|" = %7C
var Address=Localidad+"+"+PostalCode
val geolocation= new Geolocation(Address,Component,APIKey, PostalCode )
希望这对某人有所帮助!