雅典娜-抓取存储桶中的最新文件

时间:2019-02-19 15:07:12

标签: amazon-s3 amazon-athena

我是Athena和S3的新手。我们已经设置了Athena来访问连接到数据库的S3存储桶,每个存储桶每天都有相同的数据表。 如:

database-name - "sales"
tables: ["19.02.2019", "18.02.2019",..."01.02.2019"]

要查询该表,我需要运行以下示例:

SELECT 
a.creation_date,
a.number,
pa.customer_number,
a.customer_type,
a.name,
a.city,
a.country,
a.type,
a.business,
b.industry,
cu.group,
cu.closing_date,
cu.interest_flag,
FROM 
    (SELECT a.creation_date,
     a.type,
     a.number,
     a.customer_type,
     a.business,
     a.id,
     b.industry,
     customer.id,
     concat (p.first_name, ' ' ,p.last_name) AS name, p.address, p.country
    FROM "accounts"."2019_02_19_01_32_18" AS a
    LEFT JOIN "customers"."2019_02_19_02_31_03" AS c
        ON a.id=c.id
    LEFT JOIN "people"."2019_02_19_06_05_10" AS p
        ON c.person_id=p.id
    LEFT JOIN "strategic_partners"."2019_02_18_05_57_59" AS par
        ON par.uid=p.strapartner_uid
    WHERE a.number is NOT null  and a.customer_type = (1)

    UNION

    SELECT a.creation_date,
    a.type,
    a.number,
    a.customer_type,
    a.business_name,
    a.id,
    b.industry,
    customer.id,
    concat (p.first_name, ' ',p.last_name) AS name, p.address, p.country
    FROM "accounts"."2019_02_19_01_32_18" AS a
    LEFT JOIN "customers"."2019_02_19_02_31_03" AS c
        ON a.id=c.id
    LEFT JOIN "people"."2019_02_19_06_05_10" AS p
        ON c.person_id=p.id
    LEFT JOIN "strategic_partners"."2019_02_18_05_57_59" AS par
        ON par.uid=p.strapartner_uid
    WHERE a.number is NOT null and a.customer_type IN (4,8)
    ) AS a

    LEFT JOIN "progressive_accounts"."2019_02_18_18_15_28" AS pa
     ON pa.credit_number = a.credit_number
    LEFT JOIN "progressive_customer"."2019_02_18_18_15_01" AS cu
     ON pa.prog_number=cu.prog_number
     WHERE a.creation_date>='2018-10-01' AND a.creation_date<='2018-12-31'
     ORDER BY a.creation_date desc, a.business_name asc

我正在尝试确定是否有一种方法可以动态查询最新的可用表?是否可以在查询中使用函数或其他替代解决方案?

后续问题是为什么我不能对此查询使用CREATE VIEW。 我收到一个错误: 您的查询具有以下错误:

Access denied when writing to location: s3://dp-jupyterlabXXXXXXXXXXXXXX/notebooks/<username>/athena/Unsaved/2019/02/25/<unique reference id>.txt

This query ran against the "database name" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: xxxxxx-xxxx-xxxxx-xxxx-xxxxxxxxxxx.

当我运行SELECT语句时,查询成功,并按预期生成结果。

我无法弄清楚为什么会返回错误-为了检查是否是权限问题,我已将以下策略添加到我的角色中:

- glue access to the bucket

- all glue policies to the user

我也无法弄清楚为什么Athena试图在Athena右侧下拉菜单而不是“公共”数据库(例如PostgreSQL或类似数据库)上选择的数据库上创建VIEW。

任何指导都很棒!

1 个答案:

答案 0 :(得分:1)

您不能使用子查询返回用于查询的表名。

相反,您每天可以使用CREATE OR REPLACE VIEW来指向“最新”表。然后,只需查询视图即可。

您可能有一些日常任务正在创建每个表,因此还要更新视图。