如何在AWS Redshift中向现有表添加排序键

时间:2013-07-26 14:35:50

标签: amazon-redshift

在AWS Redshift中,我想向已创建的表添加排序键。是否有任何可以添加列并将其用作排序键的命令?

7 个答案:

答案 0 :(得分:28)

正如Yaniv Kessler所说,创建表后无法添加或更改distkey和排序键,您必须重新创建表并将所有数据复制到新表中。 您可以使用以下SQL格式重新创建具有新设计的表。

ALTER TABLE test_table RENAME TO old_test_table;
CREATE TABLE new_test_table([new table columns]);
INSERT INTO new_test_table (SELECT * FROM old_test_table);
ALTER TABLE new_test_table RENAME TO test_table;
DROP TABLE old_test_table;

根据我的经验,此SQL不仅用于更改distkey和sortkey,还用于设置编码(压缩)类型。

答案 1 :(得分:25)

要添加Yaniv的答案,理想的方法是使用CREATE TABLE AS命令。您可以显式指定distkey和sortkey。即

CREATE TABLE test_table_with_dist 
distkey(field) 
sortkey(sortfield) 
AS 
select * from test_table

其他例子:

http://docs.aws.amazon.com/redshift/latest/dg/r_CTAS_examples.html

修改

我注意到这种方法不保留编码。 Redshift仅在复制语句期间自动编码。如果这是一个持久表,您应该重新定义表并指定编码。

create table test_table_with_dist(
    field1 varchar encode row distkey
    field2 timestam pencode delta sortkey);

insert into test_table select * from test_table;

您可以通过运行analyze compression test_table;

来确定要使用的编码

答案 2 :(得分:19)

目前我认为它不可能(希望将来会改变)。在过去遇到这种情况时,我创建了一个新表并将旧数据中的数据复制到其中。

来自http://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE.html

  

ADD [COLUMN] column_name   将具有指定名称的列添加到表中。您只能在每个ALTER TABLE语句中添加一列。

     

您无法添加作为分发键(DISTKEY)的列或表格的排序键(SORTKEY)。

     

您不能使用ALTER TABLE ADD COLUMN命令修改以下表和列属性:

     

UNIQUE

     

PRIMARY KEY

     

参考(外键)

     

IDENTITY

     

最大列名长度为127个字符;较长的名称被截断为127个字符。您可以在单个表中定义的最大列数为1,600。

答案 3 :(得分:3)

AWS现在允许您同时添加sortkey和distkey,而不必重新创建表:

添加排序键(或更改排序键):

ALTER TABLE data.engagements_bot_free_raw ALTER SORTKEY (id)

要更改distkey或添加distkey:

ALTER TABLE data.engagements_bot_free_raw ALTER DISTKEY id

有趣的是,在SORTKEY上强制使用了括号,但在DISTKEY上却没有。

您仍然不能就地更改表的编码-仍然需要必须重新创建表的解决方案。

答案 4 :(得分:1)

我按照这种方法将排序列添加到我的表table_transactons中,它或多或少地使用相同数量的命令。

1)alter table table_transactions重命名为table_transactions_backup; 2)创建table table_transactions复合sortkey(key1,key2,key3,key4)作为select * from table_transactions_backup; 3)drop table table_transactions_backup;

答案 5 :(得分:1)

赶上这个查询有点晚了。
我发现使用1 = 1是在redshift中创建数据并将数据复制到另一个表的最佳方法 例如: 在SELECT TABLE * FROM OLDTABLE 1 = 1处创建表NEWTABLE;

然后,您可以在验证数据已复制之后删除OLDTABLE

(如果将1 = 1替换为1 = 2,它将仅复制结构-这对于创建登台表非常有用)

答案 6 :(得分:0)

现在可以更改排序方式:

  

Amazon Redshift现在支持动态更改表排序键

Amazon Redshift now enables users to add and change sort keys of existing Redshift tables without having to re-create the table. The new capability simplifies user experience in maintaining the optimal sort order in Redshift to achieve high performance as their query patterns evolve and do it without interrupting the access to the tables.

Customers when creating Redshift tables can optionally specify one or more table columns as sort keys. The sort keys are used to maintain the sort order of the Redshift tables and allows the query engine to achieve high performance by reducing the amount of data to read from disk and to save on storage with better compression. Currently Redshift customers who desire to change the sort keys after the initial table creation will need to re-create the table with new sort key definitions.

With the new ALTER SORT KEY command, users can dynamically change the Redshift table sort keys as needed. Redshift will take care of adjusting data layout behind the scenes and table remains available for users to query. Users can modify sort keys for a given table as many times as needed and they can alter sort keys for multiple tables simultaneously.

For more information ALTER SORT KEY, please refer to the documentation.

documentation

关于文档本身:

  

ALTER DISTKEY column_name或ALTER DISTSTYLE KEY DISTKEY column_name A   子句,用于更改用作   表。请考虑以下内容:

VACUUM and ALTER DISTKEY cannot run concurrently on the same table.

If VACUUM is already running, then ALTER DISTKEY returns an error.

If ALTER DISTKEY is running, then background vacuum doesn't start on a table.

If ALTER DISTKEY is running, then foreground vacuum returns an error.

You can only run one ALTER DISTKEY command on a table at a time.

The ALTER DISTKEY command is not supported for tables with interleaved sort keys.

When specifying DISTSTYLE KEY, the data is distributed by the values in the DISTKEY column. For more information about DISTSTYLE, see CREATE TABLE.
  

ALTER [COMPOUND] SORTKEY(column_name [,...])更改的子句   或添加用于表的排序键。请考虑以下内容:

You can define a maximum of 400 columns for a sort key per table.

You can only alter a compound sort key. You can't alter an interleaved sort key.

When data is loaded into a table, the data is loaded in the order of the sort key. When you alter the sort key, Amazon Redshift reorders the data. For more information about SORTKEY, see CREATE TABLE.