我想使用关系数据库来分析来自Songkick JSON API for local events的信息。
事件对象是复杂且深层嵌套的,因此我想过滤并展平事件对象并将其转换为CSV,以便我可以使用标准工具加载它们。
我可以使用jq过滤和展平事件吗?
API的典型响应太大,无法在此处显示。我将展示一个具有相同相对结构的简化版本。
将过滤器.resultsPage.results.event[]
应用于响应会生成一系列事件对象。
{
"start": {
"date": "2014-10-28"
},
"performance": [
{
"artist": {
"displayName": "James Keelaghan",
"identifier": [
{
"mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff"
}
]
}
}
],
"venue": {
"displayName": "Live At The Star"
}
}
{
"start": {
"date": "2014-10-28"
},
"performance": [
{
"artist": {
"displayName": "Katy B",
"identifier": [
{
"mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57"
}
]
}
},
{
"artist": {
"displayName": "Becky Hill",
"identifier": [
{
"mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010"
}
]
}
}
],
"venue": {
"displayName": "O2 ABC"
}
}
接下来,我想为性能列表中的每个对象生成一个输出对象。这些新对象应具有包含事件对象的属性,例如日期和地点。
示例的正确输出如下所示。
{
"venue_name": "Live At The Star",
"artist_mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff",
"artist_name": "James Keelaghan",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
"artist_name": "Katy B",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
"artist_name": "Becky Hill",
"start_date": "2014-10-28"
}
如果我忽略了mbid,这个jq过滤器给了我想要的东西。
{
start_date: .start.date,
artist_name: .performance[].artist.displayName,
venue_name: .venue.displayName
}
结果如下所示。
{
"venue_name": "Live At The Star",
"artist_name": "James Keelaghan",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_name": "Katy B",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_name": "Becky Hill",
"start_date": "2014-10-28"
}
我也试过这个过滤器来获取mbid。
{
start_date: .start.date,
artist_name: .performance[].artist.displayName,
artist_mbid: .performance[].artist.identifier[].mbid,
venue_name: .venue.displayName
}
结果如下所示。
{
"venue_name": "Live At The Star",
"artist_mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff",
"artist_name": "James Keelaghan",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
"artist_name": "Katy B",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010",
"artist_name": "Katy B",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
"artist_name": "Becky Hill",
"start_date": "2014-10-28"
}
{
"venue_name": "O2 ABC",
"artist_mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010",
"artist_name": "Becky Hill",
"start_date": "2014-10-28"
}
每个物体看起来都是正确的,但它们太多了!凯蒂B"凯蒂B" 和贝克希尔"贝基希尔"对象是重复的。
在jq中执行此操作的正确方法是什么?
答案 0 :(得分:1)
此过滤器应该有效:
.resultsPage.results.event | map(
{
venue_name: .venue.displayName,
start_date: .start.date
}
+
(.performance[].artist | {
artist_mbid: .identifier[].mbid,
artist_name: .displayName
})
)
虽然这些字段的顺序不一样,但如果需要,您可以随时重新排序:
[
{
"venue_name": "Live At The Star",
"start_date": "2014-10-28",
"artist_mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff",
"artist_name": "James Keelaghan"
},
{
"venue_name": "O2 ABC",
"start_date": "2014-10-28",
"artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
"artist_name": "Katy B"
},
{
"venue_name": "O2 ABC",
"start_date": "2014-10-28",
"artist_mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010",
"artist_name": "Becky Hill"
}
]
您正在尝试为每个相应的performance
创建一个对象,因此在开始收集结果之前,您必须将其压扁一点。