我正在使用JSON对象,并希望基于Spark SQL数据帧/数据集将object.hours转换为关系表。
我尝试使用“爆炸”,它实际上并不支持“结构数组”。
json对象在下面:
use Illuminate\Http\Request;
public function webhook (Request $request) {
if (filled($request->input('validationToken'))) {
return response($request->input('validationToken'))
->header('Content-Type', 'text/plain');
}
// code to process the webhook after validation is complete
}
对于如下所示的关系表,
{
"business_id": "abc",
"full_address": "random_address",
"hours": {
"Monday": {
"close": "02:00",
"open": "11:00"
},
"Tuesday": {
"close": "02:00",
"open": "11:00"
},
"Friday": {
"close": "02:00",
"open": "11:00"
},
"Wednesday": {
"close": "02:00",
"open": "11:00"
},
"Thursday": {
"close": "02:00",
"open": "11:00"
},
"Sunday": {
"close": "00:00",
"open": "11:00"
},
"Saturday": {
"close": "02:00",
"open": "11:00"
}
}
}
答案 0 :(得分:3)
您可以使用以下技巧进行操作:
import org.apache.spark.sql.types.StructType
val days = df.schema
.fields
.filter(_.name=="hours")
.head
.dataType
.asInstanceOf[StructType]
.fieldNames
val solution = df
.select(
$"business_id",
$"full_address",
explode(
array(
days.map(d => struct(
lit(d).as("day"),
col(s"hours.$d.open").as("open_time"),
col(s"hours.$d.close").as("close_time")
)):_*
)
)
)
.select($"business_id",$"full_address",$"col.*")
scala> solution.show
+-----------+--------------+---------+---------+----------+
|business_id| full_address| day|open_time|close_time|
+-----------+--------------+---------+---------+----------+
| abc|random_address| Friday| 11:00| 02:00|
| abc|random_address| Monday| 11:00| 02:00|
| abc|random_address| Saturday| 11:00| 02:00|
| abc|random_address| Sunday| 11:00| 00:00|
| abc|random_address| Thursday| 11:00| 02:00|
| abc|random_address| Tuesday| 11:00| 02:00|
| abc|random_address|Wednesday| 11:00| 02:00|
+-----------+--------------+---------+---------+----------+