是否有更好的方法,在红移中转换表格,而不是在Pandas中取出并完成任务。
答案 0 :(得分:0)
您可以尝试使用以下查询。但是,如果每组的行数是一个常量值,则下面的查询会有所帮助
SELECT col1,
Split_part(col_values, ',', 1) col_value1,
Split_part(col_values, ',', 2) col_value2,
Split_part(col_values, ',', 3) col_value3
FROM (SELECT col1,
Listagg(col2, ',')
within GROUP (ORDER BY col2) col_values
FROM (SELECT col1,
col2
FROM table1) derived_table1
GROUP BY col1) derived_table2
注意:如果每组具有相同的行数,或者您知道每个组的最大行数,则将使用上述查询。
答案 1 :(得分:0)
在SQL中转置数据绝不是一项有趣的任务。但是,如果数据确实适合pandas中的内存,则此程序包可能会使该过程更加顺畅。
https://github.com/agawronski/pandas_redshift
如何执行此操作的示例如下:
pip install pandas-redshift
import pandas_redshift as pr
# Provide your redshift credentials and connect to redshift
pr.connect_to_redshift(dbname = <dbname>,
host = <host>,
port = <port>,
user = <user>,
password = <password>)
# This next step reads the data from redshift to your python session
data = pr.redshift_to_pandas('select * from gawronski.nba_shots_log')
# Transpose the DataFrame
data_transposed = data.transpose()
# Provide S3 credentials
# (data goes to S3 then redshift so an S3 bucket is necessary)
pr.connect_to_s3(aws_access_key_id = <aws_access_key_id>,
aws_secret_access_key = <aws_secret_access_key>,
bucket = 'my-bucket-name',
# this is and optional parameter:
subdirectory = 'subdirectory-in-the-bucket')
# Write the transposed DataFrame back to redshift
pr.pandas_to_redshift(data_frame = data_transposed,
redshift_table_name = 'public.my_transposed_table'
修改:已更新,以避免仅提供链接答案。