AWS Athena拼合嵌套列而不会丢失空行

时间:2019-12-16 18:31:03

标签: amazon-web-services amazon-s3 amazon-athena

我正在使用AWS Athena查询S3中作为JSON输出的Jira数据。我们的数据具有一个自定义字段,其中包含受错误影响的一系列应用程序。我想做的是为受影响的应用程序创建一行,同时仍然保留未列出任何受影响的应用程序的任何问题。

我的数据

function maximum_api_filter($query_params) {
    $query_params['per_page']["maximum"]=100000;
    return $query_params;
}

add_filter('rest_product_collection_params', 'maximum_api_filter');

所需的输出

我想要一个可以在Athena中使用的查询,该查询将返回五行,如下所示:

{
    "expand": "operations,versionedRepresentations,editmeta,changelog,renderedFields",
    "id": "982832",
    "self": "https://reallycoolcompany.atlassian.net/rest/api/2/issue/982832",
    "key": "HI-1",
    "fields": {
        "summary": "Customer Care unresponsive on Web01",
        "customfield_14402": null
    }
}
{
    "expand": "operations,versionedRepresentations,editmeta,changelog,renderedFields",
    "id": "332422",
    "self": "https://reallycoolcompany.atlassian.net/rest/api/2/issue/332422",
    "key": "HI-2",
    "fields": {
        "summary": "RCC API issue - clients were experiencing a failure when navigating through the site.",
        "customfield_14402": null
    }
}
{
    "expand": "operations,versionedRepresentations,editmeta,changelog,renderedFields",
    "id": "114128",
    "self": "https://reallycoolcompany.atlassian.net/rest/api/2/issue/114128",
    "key": "HI-3",
    "fields": {
        "summary": "HIE customer can't connect",
        "customfield_14402": [
            {
                "self": "https://reallycoolcompany.atlassian.net/rest/api/2/customFieldOption/14724",
                "value": "ECC",
                "id": "14724"
            }
        ]
    }
}
{
    "expand": "operations,versionedRepresentations,editmeta,changelog,renderedFields",
    "id": "723392",
    "self": "https://reallycoolcompany.atlassian.net/rest/api/2/issue/723392",
    "key": "HI-4",
    "fields": {
        "summary": "Database Lock-Up Following Roll",
        "customfield_14402": [
            {
                "self": "https://reallycoolcompany.atlassian.net/rest/api/2/customFieldOption/14722",
                "value": "CC",
                "id": "14722"
            },
            {
                "self": "https://reallycoolcompany.atlassian.net/rest/api/2/customFieldOption/14724",
                "value": "ECC",
                "id": "14724"
            }
        ]
        }
    }
}

到目前为止我尝试过的事情

我可以使用以下查询(找到的解决方案here)为每个受影响的应用程序(customfield_14402)获取一行:

|-----------|--------------------------|
|    key    |     Affected App         |
|-----------|--------------------------|
| HI-1      | null                     |
|-----------|--------------------------|
| HI-2      | null                     |
|-----------|--------------------------|
| HI-3      | ECC                      |
|-----------|--------------------------|
| HI-4      | ECC                      |
|-----------|--------------------------|
| HI-4      | CC                       |
|-----------|--------------------------|

但是,这将排除我的两个问题,而没有任何受影响的应用程序。

我可以通过在第一个查询中创建一个视图,然后在另一个查询中将其加入来解决此问题,如下所示:

SELECT key,affected_apps.value as affected_apps        
FROM jira_quicksight_poc
CROSS JOIN UNNEST(fields.customField_14402) as t(affected_apps)

但是,这会使我在S3中的数据被扫描两次。尝试避免这种情况,因为这样可以通过Athena定价有效地使我们为该查询增加一倍的费用。

是否可以通过一次查询/扫描数据来获得所需的结果?

0 个答案:

没有答案