由于OSError,从雪花阶段复制到多个JSON文件的表失败:[WinError 145]

时间:2018-12-12 20:00:43

标签: json windows copy snowflake-datawarehouse

我要将从Windows文件系统上载的多个JSON文件复制到我的雪花表,但是COPY INTO命令失败,并出现Windows操作系统错误。

  

以这种方式从Windows本地文件暂存多个JSON文件-

> show(CVAR_ARCH1_h1)

*-------------------------------------*
*              GARCH Roll             *
*-------------------------------------*
No.Refits       : 3078
Refit Horizon   : 1
No.Forecasts    : 3078
GARCH Model     : sGARCH(1,0)
Distribution    : norm 

Forecast Density:
                         Mu  Sigma Skew Shape Shape(GIG) Realized
1972-09-28 01:00:00 -0.0274 0.0113    0     0          0   0.0014
1972-09-29 01:00:00 -0.0261 0.0238    0     0          0   0.0125
1972-09-30 01:00:00 -0.0256 0.0341    0     0          0  -0.0031
1972-10-01 01:00:00 -0.0249 0.0218    0     0          0  -0.0115
1972-10-02 01:00:00 -0.0004 0.0138    0     0          0   0.0015
1972-10-03 01:00:00 -0.0313 0.0328    0     0          0   0.0137

..........................
                       Mu  Sigma Skew Shape Shape(GIG) Realized
1981-02-25 01:00:00 8e-04 0.0062    0     0          0  -0.0124
1981-02-26 01:00:00 8e-04 0.0107    0     0          0  -0.0018
1981-02-27 01:00:00 5e-04 0.0064    0     0          0  -0.0255
1981-02-28 01:00:00 4e-04 0.0180    0     0          0  -0.0212
1981-03-01 01:00:00 7e-04 0.0158    0     0          0   0.0269
1981-03-02 01:00:00 8e-04 0.0184    0     0          0  -0.0175

Elapsed: 26.66044 mins
  

列出暂存文件会显示它们已准备好被复制-

cursor.execute("put file://C:\\Users\\nrajora\\data\\*.json @my_json_stage "
               "auto_compress=true;")

输出:

cs.execute("list @my_json_stage")
    all_rows = cs.fetchall()
    for row in all_rows:
        print("row: "+str(row))

...

('my_json_stage/xbg.json.gz', 40480, '07790f0478b333041e57435733a6d550', 'Wed, 12 Dec 2018 19:38:03 GMT')
('my_json_stage/xbu.json.gz', 108544, 'c7e164e041a459a3c2e28d6f73c14bc5', 'Wed, 12 Dec 2018 19:38:03 GMT')
('my_json_stage/xcd.json.gz', 60096, '6ce8cbb867f17077969a3110bfa51da9', 'Wed, 12 Dec 2018 19:38:03 GMT')
  

这是复制命令

('my_json_stage/xgh.json.gz', 31264, 'e46a75c0640fd59c256b654e02bf844a', 'Wed, 12 Dec 2018 19:38:03 GMT')
('my_json_stage/xgo.json.gz', 42752, 'aef9b6d6e536f794ce7f7e9429c46ff8', 'Wed, 12 Dec 2018 19:38:03 GMT')

这是错误

cs.execute("copy into UCLAIM_XML_JSON from @my_json_stage"
               "pattern = '.*.json'")

似乎Snowflake正在尝试通过Windows%TEMP%目录上传多个文件, 并且无法清除该目录,因为该目录已在所有其他正在运行的程序中共享。

编辑: 我尝试了以下变通方法,该变通方法似乎可用于最多500个JSON文件(大约150MB的数据),但对于总大小超过180MB的数据文件却失败,并出现相同的错误。

python.exe C:\Users\nrajora\PycharmProjects\sf_poc_1\load_json_into_table_from_localfs.py
Traceback (most recent call last):
  File "C:\Users\nrajora\PycharmProjects\sf_poc_1\load_json_into_table_from_localfs.py", line 25, in <module>
    cs.execute("put file://C:\\Users\\nrajora\\Downloads\\OneClaimdata\\*.json @my_json_stage "
  File "C:\Users\nrajora\AppData\Local\Programs\Python\Python37-32\lib\site-packages\snowflake\connector\cursor.py", line 519, in execute
    sf_file_transfer_agent.execute()
  File "C:\Users\nrajora\AppData\Local\Programs\Python\Python37-32\lib\site-packages\snowflake\connector\file_transfer_agent.py", line 194, in execute
    self.upload(large_file_metas, small_file_metas)
  File "C:\Users\nrajora\AppData\Local\Programs\Python\Python37-32\lib\site-packages\snowflake\connector\file_transfer_agent.py", line 215, in upload
    self._upload_files_in_parallel(small_file_metas)
  File "C:\Users\nrajora\AppData\Local\Programs\Python\Python37-32\lib\site-packages\snowflake\connector\file_transfer_agent.py", line 264, in _upload_files_in_parallel
    target_meta)
  File "C:\Users\nrajora\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 290, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\nrajora\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 683, in get
    raise self._value
  File "C:\Users\nrajora\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\nrajora\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 44, in mapstar
    return list(map(*args))
  File "C:\Users\nrajora\AppData\Local\Programs\Python\Python37-32\lib\site-packages\snowflake\connector\file_transfer_agent.py", line 371, in upload_one_file
    shutil.rmtree(tmp_dir)
  File "C:\Users\nrajora\AppData\Local\Programs\Python\Python37-32\lib\shutil.py", line 507, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "C:\Users\nrajora\AppData\Local\Programs\Python\Python37-32\lib\shutil.py", line 395, in _rmtree_unsafe
    onerror(os.rmdir, path, sys.exc_info())
  File "C:\Users\nrajora\AppData\Local\Programs\Python\Python37-32\lib\shutil.py", line 393, in _rmtree_unsafe
    os.rmdir(path)
OSError: [WinError 145] The directory is not empty: 'C:\\Users\\nrajora\\AppData\\Local\\Temp\\tmp8iw2hs7i'

Process finished with exit code 1

这真的很奇怪。 有解决此问题的方法吗? 感谢任何帮助!

0 个答案:

没有答案