如何在红移中转置表格?

时间:2016-11-08 08:13:30

标签: pandas amazon-redshift transpose

是否有更好的方法,在红移中转换表格,而不是在Pandas中取出并完成任务。

2 个答案:

答案 0 :(得分:0)

您可以尝试使用以下查询。但是,如果每组的行数是一个常量值,则下面的查询会有所帮助

SELECT col1, 
       Split_part(col_values, ',', 1) col_value1, 
       Split_part(col_values, ',', 2) col_value2, 
       Split_part(col_values, ',', 3) col_value3 
FROM  (SELECT col1, 
              Listagg(col2, ',') 
                within GROUP (ORDER BY col2) col_values 
       FROM   (SELECT col1, 
                      col2 
               FROM   table1) derived_table1
       GROUP  BY col1) derived_table2

注意:如果每组具有相同的行数,或者您知道每个组的最大行数,则将使用上述查询。

答案 1 :(得分:0)

在SQL中转置数据绝不是一项有趣的任务。但是,如果数据确实适合pandas中的内存,则此程序包可能会使该过程更加顺畅。

https://github.com/agawronski/pandas_redshift

如何执行此操作的示例如下:

pip install pandas-redshift

import pandas_redshift as pr

# Provide your redshift credentials and connect to redshift
pr.connect_to_redshift(dbname = <dbname>,
                        host = <host>,
                        port = <port>,
                        user = <user>,
                        password = <password>)

# This next step reads the data from redshift to your python session
data = pr.redshift_to_pandas('select * from gawronski.nba_shots_log')

# Transpose the DataFrame
data_transposed = data.transpose()

# Provide S3 credentials 
# (data goes to S3 then redshift so an S3 bucket is necessary)
pr.connect_to_s3(aws_access_key_id = <aws_access_key_id>,
                aws_secret_access_key = <aws_secret_access_key>,
                bucket = 'my-bucket-name',
                # this is and optional parameter:
                subdirectory = 'subdirectory-in-the-bucket')

# Write the transposed DataFrame back to redshift
pr.pandas_to_redshift(data_frame = data_transposed, 
             redshift_table_name = 'public.my_transposed_table'

修改:已更新,以避免仅提供链接答案。