如何在不创建重复对象的情况下展平此对象流?

时间:2014-10-31 01:19:40

标签: jq songkick

我想使用关系数据库来分析来自Songkick JSON API for local events的信息。

事件对象是复杂且深层嵌套的,因此我想过滤并展平事件对象并将其转换为CSV,以便我可以使用标准工具加载它们。

我可以使用jq过滤和展平事件吗?

API的典型响应太大,无法在此处显示。我将展示一个具有相同相对结构的简化版本。

将过滤器.resultsPage.results.event[]应用于响应会生成一系列事件对象。

{
  "start": {
    "date": "2014-10-28"
  },
  "performance": [
    {
      "artist": {
        "displayName": "James Keelaghan",
        "identifier": [
          {
            "mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff"
          }
        ]
      }
    }
  ],
  "venue": {
    "displayName": "Live At The Star"
  }
}
{
  "start": {
    "date": "2014-10-28"
  },
  "performance": [
    {
      "artist": {
        "displayName": "Katy B",
        "identifier": [
          {
            "mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57"
          }
        ]
      }
    },
    {
      "artist": {
        "displayName": "Becky Hill",
        "identifier": [
          {
            "mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010"
          }
        ]
      }
    }
  ],
  "venue": {
    "displayName": "O2 ABC"
    }
}

接下来,我想为性能列表中的每个对象生成一个输出对象。这些新对象应具有包含事件对象的属性,例如日期和地点。

示例的正确输出如下所示。

{
  "venue_name": "Live At The Star",
  "artist_mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff",
  "artist_name": "James Keelaghan",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
  "artist_name": "Katy B",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
  "artist_name": "Becky Hill",
  "start_date": "2014-10-28"
}

如果我忽略了mbid,这个jq过滤器给了我想要的东西。

{
  start_date: .start.date,
  artist_name: .performance[].artist.displayName,
  venue_name: .venue.displayName
}

结果如下所示。

{
  "venue_name": "Live At The Star",
  "artist_name": "James Keelaghan",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_name": "Katy B",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_name": "Becky Hill",
  "start_date": "2014-10-28"
}

我也试过这个过滤器来获取mbid。

{
  start_date: .start.date,
  artist_name: .performance[].artist.displayName,
  artist_mbid: .performance[].artist.identifier[].mbid,
  venue_name: .venue.displayName
}

结果如下所示。

{
  "venue_name": "Live At The Star",
  "artist_mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff",
  "artist_name": "James Keelaghan",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
  "artist_name": "Katy B",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010",
  "artist_name": "Katy B",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
  "artist_name": "Becky Hill",
  "start_date": "2014-10-28"
}
{
  "venue_name": "O2 ABC",
  "artist_mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010",
  "artist_name": "Becky Hill",
  "start_date": "2014-10-28"
}

每个物体看起来都是正确的,但它们太多了!凯蒂B"凯蒂B" 和贝克希尔"贝基希尔"对象是重复的。

在jq中执行此操作的正确方法是什么?

1 个答案:

答案 0 :(得分:1)

此过滤器应该有效:

.resultsPage.results.event | map(
    {
        venue_name: .venue.displayName,
        start_date: .start.date
    }
    +
    (.performance[].artist | {
        artist_mbid: .identifier[].mbid,
        artist_name: .displayName
    })
)

虽然这些字段的顺序不一样,但如果需要,您可以随时重新排序:

[
  {
    "venue_name": "Live At The Star",
    "start_date": "2014-10-28",
    "artist_mbid": "08e5954e-efc0-4a95-95ac-d74cca5b79ff",
    "artist_name": "James Keelaghan"
  },
  {
    "venue_name": "O2 ABC",
    "start_date": "2014-10-28",
    "artist_mbid": "2df30b6c-997d-4c3f-abb5-5e0d6317ea57",
    "artist_name": "Katy B"
  },
  {
    "venue_name": "O2 ABC",
    "start_date": "2014-10-28",
    "artist_mbid": "27bc6f5b-4585-49ab-8d7d-c62b59f5f010",
    "artist_name": "Becky Hill"
  }
]

您正在尝试为每个相应的performance创建一个对象,因此在开始收集结果之前,您必须将其压扁一点。