Snowflake python连接器不适用于AWS Lambda中的较大数据集

时间:2018-07-18 13:32:16

标签: python database aws-lambda data-warehouse snowflake-datawarehouse

我正在使用Snowflakes python连接器尝试从我们的数据仓库中检索一组数据进行处理。该作业正在AWS lambda函数中执行,并且在返回的行数约为20左右时会出现问题。设置limit 10limit 20时,我可以取回数据集。如果我将limit保留下来,那么它会很难获得仅65行的结果集。

我的lambda中的内存和超时值已经达到最大值,并且导出到CSV的数据集仅为300KB。在本地运行此查询返回的结果很好,因此它可能与内存大小有关,但是返回的数据实际上并不是那么大。

connector = snowflake.connector.connect(
    account=os.environ['SNOWFLAKE_ACCOUNT'],
    user=os.environ['SNOWFLAKE_USER'],
    password=os.environ['SNOWFLAKE_PASSWORD'],
    role="MY_ROLE",
    ocsp_response_cache_filename="/tmp/.cache/snowflake/"
                                 "ocsp_response_cache",
)
print("Connected to snowflake")
cursor = connector.cursor(DictCursor)
cursor.execute('USE DATA.INFORMATION_SCHEMA')

query = "SELECT * FROM TABLE WHERE X=Y"  # FAKE QUERY

print("Execute query: \n\t{0}".format(query))
cursor.execute(query)
print("Execute query done!")
posts = []
processed = 0
for rec in cursor:
    processed += 1
    print("Processed count: {}".format(processed))
    posts.append(rec)

# These attempts also didn't work. 
# posts = cursor.fetchmany(size=cursor.rowcount)
# posts = cursor.fetchall()

cursor.close()

processed整数值最多可获取17条记录,但随后会暂停。我的日志输出了很多有关尚未准备好要消耗的块的信息,最终lambda只是超时了

[1531919679073] [DEBUG] 2018-07-18T13:14:39.72Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 Chunk Downloader in memory
[1531919679073] Execute query done!
[1531919679073] [DEBUG] 2018-07-18T13:14:39.73Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 chunk index: 0, chunk_count: 2
[1531919679073] [DEBUG] 2018-07-18T13:14:39.73Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 next_chunk_to_consume=1, next_chunk_to_download=3, total_chunks=2
[1531919679073] [DEBUG] 2018-07-18T13:14:39.73Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 waiting for chunk 1/2 in 1/10 download attempt
[1531919679073] [DEBUG] 2018-07-18T13:14:39.73Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 chunk 1/2 is NOT ready to consume in 10/3600(s)
[1531919679073] [DEBUG] 2018-07-18T13:14:39.73Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 downloading chunk 1/2
[1531919679074] [DEBUG] 2018-07-18T13:14:39.73Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 use chunk headers from result
[1531919679074] [DEBUG] 2018-07-18T13:14:39.74Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 started getting the result set 1: https://sfc-va-ds1-customer-stage.s3.amazo
naws.com/fwoi-s-vass0007/results/7b9cf772-a061-47ab-8e9f-43dbfcd923c9_0/main/data_0_0_0?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-e
ncoding=gzip&AWSAccessKeyId=AKIAJKHCJ73YL7MD6ZRA&Expires=1531941279&Signature=VvGOkLNvE%2FHVMaUXoeQMn6cFUOY%3D
[1531919679074] [DEBUG] 2018-07-18T13:14:39.74Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 Active requests sessions: 1, idle: 0
[1531919679074] [DEBUG] 2018-07-18T13:14:39.74Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 remaining request timeout: 3600, retry cnt: 1
[1531919679074] [DEBUG] 2018-07-18T13:14:39.74Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 socket timeout: 60
[1531919679075] [INFO] 2018-07-18T13:14:39.75Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 Starting new HTTPS connection (1): sfc-va-ds1-customer-stage.s3.amazonaws.com
[1531919679078] [DEBUG] 2018-07-18T13:14:39.75Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 downloading chunk 2/2
[1531919679078] [DEBUG] 2018-07-18T13:14:39.76Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 use chunk headers from result
[1531919679078] [DEBUG] 2018-07-18T13:14:39.76Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 started getting the result set 2: https://sfc-va-ds1-customer-stage.s3.amazo
naws.com/fwoi-s-vass0007/results/7b9cf772-a061-47ab-8e9f-43dbfcd923c9_0/main/data_0_0_1?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-e
ncoding=gzip&AWSAccessKeyId=AKIAJKHCJ73YL7MD6ZRA&Expires=1531941279&Signature=F5ix8FcsLO1dM8sWsZXZYx4uHM8%3D
[1531919679078] [DEBUG] 2018-07-18T13:14:39.76Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 Converted retries value: 1 -> Retry(total=1, connect=None, read=None, redire
ct=None)
[1531919679078] [DEBUG] 2018-07-18T13:14:39.76Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 Converted retries value: 1 -> Retry(total=1, connect=None, read=None, redire
ct=None)
[1531919679078] [DEBUG] 2018-07-18T13:14:39.76Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 Active requests sessions: 2, idle: 0
[1531919679078] [DEBUG] 2018-07-18T13:14:39.76Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 remaining request timeout: 3600, retry cnt: 1
[1531919679078] [DEBUG] 2018-07-18T13:14:39.76Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 socket timeout: 60
[1531919679078] [INFO] 2018-07-18T13:14:39.77Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 Starting new HTTPS connection (1): sfc-va-ds1-customer-stage.s3.amazonaws.com
[1531919681581] [DEBUG] 2018-07-18T13:14:41.580Z 26284dc8-8a8c-11e8-95ac-3ff42bd28642 chunk 1/2 is NOT ready to consume in 160/3600(s)
[1531919689074] [DEBUG] 2018-07-18T13:14:49.73Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 chunk 1/2 is NOT ready to consume in 20/3600(s)
[1531919691581] [DEBUG] 2018-07-18T13:14:51.581Z 26284dc8-8a8c-11e8-95ac-3ff42bd28642 chunk 1/2 is NOT ready to consume in 170/3600(s)
[1531919699074] [DEBUG] 2018-07-18T13:14:59.74Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 chunk 1/2 is NOT ready to consume in 30/3600(s)
[1531919701581] [DEBUG] 2018-07-18T13:15:01.581Z 26284dc8-8a8c-11e8-95ac-3ff42bd28642 chunk 1/2 is NOT ready to consume in 180/3600(s)
[1531919709074] [DEBUG] 2018-07-18T13:15:09.74Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 chunk 1/2 is NOT ready to consume in 40/3600(s)
[1531919711582] [DEBUG] 2018-07-18T13:15:11.581Z 26284dc8-8a8c-11e8-95ac-3ff42bd28642 chunk 1/2 is NOT ready to consume in 190/3600(s)
[1531919712739] [DEBUG] 2018-07-18T13:15:12.738Z 26284dc8-8a8c-11e8-95ac-3ff42bd28642 Incremented Retry for (url='/fwoi-s-vass0007/results/7b9cf772-a061-47ab-8e9
f-43dbfcd923c9_0/main/data_0_0_0?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=AKIAJKHCJ73YL7MD6ZRA&Expire
s=1531941131&Signature=mW6nXerwYHhnfwfPdRF0So1tpIQ%3D'): Retry(total=0, connect=None, read=None, redirect=None)
[1531919719075] [DEBUG] 2018-07-18T13:15:19.75Z 7e3420c6-8a8c-11e8-a97e-c53a2c591430 chunk 1/2 is NOT ready to consume in 50/3600(s)

2 个答案:

答案 0 :(得分:0)

从日志看来,python连接器似乎一直在尝试从s3下载结果。如果您的查询生成大量数据,这是预期的行为。我建议您尝试确保您的Lambda环境可以访问s3存储桶。一个简单的curl命令应该验证它。

curl -v https://sfc-va-ds1-customer-stage.s3.amazonaws.com

如果您可以获取一些HTTP代码(例如403),则表明您已建立连接。否则,如果挂起,则说明您的环境中的配置不正确。

答案 1 :(得分:0)

我遇到了类似的问题,但是使用Snowflake JDBC连接器。

从表中选择* :获取第一个数据块(600条记录),然后在获取下一个数据块时出现“连接超时”

如果我这样做了,从表限制1200中选择* ,它可以正常工作并且没有超时

因此,将整个过程分解为两个步骤。

  1. rowcount = 从表中选择计数(*)
  2. 从表限制行数中选择*