我有许多文件以bz2格式压缩,我正尝试使用python在一个临时目录中解压缩它们,然后进行分析。有成千上万个文件,因此手动解压缩文件不可行,因此我编写了以下脚本。
我的问题是,每当尝试执行此操作时,即使手动解压缩每个文件大约6 MB,最大文件大小还是900 kb。我不确定这是否是我的代码中的缺陷以及如何将数据另存为字符串然后复制到文件或其他问题。我已经尝试过使用其他文件,并且我知道它适用于小于900 kb的文件。还有其他人有类似的问题并且知道解决方案吗?
我的代码如下:
import numpy as np
import bz2
import os
import glob
def unzip_f(filepath):
'''
Input a filepath specifying a group of Himiwari .bz2 files with common names
Outputs the path of all the temporary files that have been uncompressed
'''
cpath = os.getcwd() #get current path
filenames_ = [] #list to add filenames to for future use
for zipped_file in glob.glob(filepath): #loop over the files that meet the name criterea
with bz2.BZ2File(zipped_file,'rb') as zipfile: #Read in the bz2 files
newfilepath = cpath +'/temp/'+zipped_file[-47:-4] #create a temporary file
with open(newfilepath, "wb") as tmpfile: #open the temporary file
for i,line in enumerate(zipfile.readlines()):
tmpfile.write(line) #write the data from the compressed file to the temporary file
filenames_.append(newfilepath)
return filenames_
path_='test/HS_H08_20180930_0710_B13_FLDK_R20_S*bz2'
unzip_f(path_)
它返回正确的文件路径,但错误的大小上限为900 kb。
答案 0 :(得分:0)
事实证明,此问题是由于文件为多流而在python 2.7中不起作用。 jasonharper和here提到了更多信息here。下面是一个仅使用Unix命令解压缩bz2文件,然后将它们移动到我想要的临时目录的解决方案。它不是那么漂亮,但是可以。
import numpy as np
import os
import glob
import shutil
def unzip_f(filepath):
'''
Input a filepath specifying a group of Himiwari .bz2 files with common names
Outputs the path of all the temporary files that have been uncompressed
'''
cpath = os.getcwd() #get current path
filenames_ = [] #list to add filenames to for future use
for zipped_file in glob.glob(filepath): #loop over the files that meet the name criterea
newfilepath = cpath +'/temp/' #create a temporary file
newfilename = newfilepath + zipped_file[-47:-4]
os.popen('bzip2 -kd ' + zipped_file)
shutil.move(zipped_file[-47:-4],newfilepath)
filenames_.append(newfilename)
return filenames_
path_='test/HS_H08_20180930_0710_B13_FLDK_R20_S0*bz2'
unzip_f(path_)
答案 1 :(得分:0)
这是Python2中的一个已知限制,其中2019-08-05T10:28:35.985995+00:00 heroku[router]: at=error code=H10 desc="App crashed" method=GET path="/" host=gentle-badlands-35223.herokuapp.com request_id=1a6f1655-5d02-43c7-b629-c2b4897e76bf fwd="83.174.32.242" dyno= connect= service= status=503 bytes= protocol=https
2019-08-05T10:28:36.411718+00:00 heroku[router]: at=error code=H10 desc="App crashed" method=GET path="/favicon.ico" host=gentle-badlands-35223.herokuapp.com request_id=d4d9f0e0-8495-43e2-929b-8664f88503e7 fwd="83.174.32.242" dyno= connect= service= status=503 bytes= protocol=https
2019-08-05T10:29:31.130647+00:00 heroku[router]: at=error code=H10 desc="App crashed" method=GET path="/" host=gentle-badlands-35223.herokuapp.com request_id=2f3ab1c6-7b2f-47fd-827e-c4fc1725eda3 fwd="83.174.32.242" dyno= connect= service= status=503 bytes= protocol=https
2019-08-05T10:29:31.390998+00:00 heroku[router]: at=error code=H10 desc="App crashed" method=GET path="/favicon.ico" host=gentle-badlands-35223.herokuapp.com request_id=088bbe15-3618-48d9-8c59-3a0d88d170f1 fwd="83.174.32.242" dyno= connect= service= status=503 bytes= protocol=https
类不支持多个流。
可以通过使用BZ2File
,https://pypi.org/project/bz2file/轻松解决此问题,这是Python3实现的一个后向端口,可以用作直接替代。
运行bz2file
后,您可以将其替换为pip install bz2file
:
bz2
,一切都应该正常工作:)
原始的Python错误报告:https://bugs.python.org/issue1625