Question

比方说，我有以下月度表，其表名格式设置为使得下划线后的数字表示月份。我要做的是将这12个表合并为一个表，而不必编写10-30个insert / union所有语句

table_1
table_2
table_3
table_4
table_5
table_6
table_7
table_8
table_9
table_10
table_11
table_12 -- (only 12 in this instance but could be as many as 36)

我当前的方法是首先使用table_1中的数据创建主表。

create temporary table master_table_1_12 as
select *                                       -- * to keep it simple for this example
from table_1;

然后使用变量，这样我就可以一直按下运行按钮，直到出现“ table_13不存在”错误

set month_id=(select max(month_id) from master_table_1_12) + 1; set table_name=concat('table_',$month_id); insert into master_table_1_12 select * from identifier($table_name);

注意：所有月度表都有一个month_id列

当然，它可以节省控制台上的一些空间（与多次插入相比），但是我仍然必须运行12次。我可以使用雪花任务吗？我无法从他们的文档中找到合适的示例来进行编码，但是，如果有人成功解决了此类问题，或者使用了基于Java的SP 解决了此类问题，请给予启发。

Answer 1

这是一个存储过程，将从table_1到table_12上的选择插入到master_table_1_12中。根据需要进行修改：

create or replace procedure FILL_MASTER_TABLE()
returns string
language javascript
as
$$
    var rows = 0;
    for (var i=1; i<=12; i++) {
        rows += insertRows(i);
    }
    return rows + " rows inserted into master_table_1_12.";

// End of main function

function insertRows(i) {

sql = 
`insert into master_table_1_12 
select * 
from table_${i};`;

return doInsert(sql);
}

function doInsert(queryString) {
    var out;
    cmd1 = {sqlText: queryString};
    stmt = snowflake.createStatement(cmd1);
    var rs = stmt.execute();;
    rs.next();
    return rs.getColumnValue(1);
}
$$;

call fill_master_table();

顺便说一句，如果您没有任何处理要做而只需要合并表，则可以执行以下操作：

insert into master_table_1_12 
select * from table_1
    union all
select * from table_2
    union all
select * from table_3
    union all
select * from table_4
    union all
select * from table_5
    union all
select * from table_6
    union all
select * from table_7
    union all
select * from table_8
    union all
select * from table_9
    union all
select * from table_10
    union all
select * from table_11
    union all
select * from table_12
;

Answer 2

您不能在这12个表之上创建视图吗？该视图将是所有这些表的并集。

基于以下评论，我进一步阐述了我的答案。请尝试这种方法。当您的桌子很大时，它将提供更好的性能。对它进行分区将提高性能。这是基于真实经验。

CREATE TABLE SALES_2000 (REGION VARCHAR, UNITS_SOLD NUMBER);
CREATE TABLE SALES_2001 (REGION VARCHAR, UNITS_SOLD NUMBER);
CREATE TABLE SALES_2002 (REGION VARCHAR, UNITS_SOLD NUMBER);
CREATE TABLE SALES_2003 (REGION VARCHAR, UNITS_SOLD NUMBER);

INSERT INTO SALES_2000 VALUES('ASIA', 25);
INSERT INTO SALES_2001 VALUES('ASIA', 50);
INSERT INTO SALES_2002 VALUES('ASIA', 55);
INSERT INTO SALES_2003 VALUES('ASIA', 65);

CREATE VIEW ALL_SALES AS
SELECT * FROM SALES_2000
UNION
SELECT * FROM SALES_2001
UNION
SELECT * FROM SALES_2002
UNION
SELECT * FROM SALES_2003;


SELECT * FROM ALL_SALES WHERE UNITS_SOLD = 25;

Answer 3

对于我的用例，我最终创建了一个 UDF，它为与我在选择查询中指定的模式相匹配的表吐出一个 create view 语句。我使用具有非常特定命名约定的表格，如下所示，区分相似表格的唯一因素是 _ 和 m 之间的数字。这对我来说效果很好，但您的里程可能会有所不同。

sales_division_123m
sales_division_124m
sales_division_125m
....

我创建函数--

create or replace function combine_tables (table_pattern varchar(100))
returns table ("" varchar(10000)) as --empty header for easy copying and pasting

$$  
(
select 'create or replace view named_whatever as'
union all
select  concat('select * from ' , 
                lower(table_name), 
                case when table_name < max(table_name) over() then ' union all' 
                     else ';' end)
from warehouse_name.information_schema.tables
where table_name ilike table_pattern
order by 1
)
$$;

然后我所要做的就是运行选择查询，复制粘贴输出，并为 view 命名。

select *
from table(combine_tables('sales_division_%m'));

输出

create view named_whatever as

select * from sales_division_123m union all
select * from sales_division_124m union all
select * from sales_division_125m;

我可以调整该函数，以便它还根据要组合的表的范围为视图提供合适的名称，例如master_sales_division_123m_125m，但为了灵活起见，我省略了该部分。

在Snowflake中将多张表合并为一张

3 个答案: