由于某些原因,我编写的代码无法正常工作。
import pandas as pd
import glob
import zipfile
path = r"C:/Users/nano/Documents/Project" # use your path
all_files = glob.glob(path + "/*.gz")
for folder in all_files:
with zipfile.ZipFile(folder,"r") as zip_ref:
zip_ref.extractall(path)
答案 0 :(得分:1)
首先,您要对Gzip使用Zip。因此,您需要使用正确的库。下面是该代码的一个工作示例。
import glob
import os
import gzip
path = r"C:/Temp/Unzip" # use your path
all_files = glob.glob(path + "/*.gz")
print(all_files)
for file in all_files:
path, filename = os.path.split(file)
filename = os.path.splitext(filename)[0]
with gzip.open(file,"rb") as gz:
with open('{0}/{1}.csv'.format(path, filename), 'wb') as cv:
cv.writelines(gz.read())
答案 1 :(得分:1)
gzip(.gz)和zip(.zip)是两回事。对于gzip,您可以使用gzip
:
import glob
import gzip
import shutil
path = r"C:/Users/shedez/Documents/Project" # use your path
all_files = glob.glob(path + "/*.gz")
for folder in all_files:
dst=folder[:-3] # destination file name
with gzip.open(folder, 'rb') as f_in, open(dst, 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
答案 2 :(得分:0)
如果您使用gz(gZip)格式,则可能需要查看NSMergePolicy
包,但我不知道提取方法,但是您可以使用纯熊猫做类似的事情,我找到更方便的方法:
gzip
[:-2]是要剪切“ gz”的,您可能想要更改read_csv的参数(添加标题行或其他内容)或to_csv的标志(设置参数for folder in all_files:
c = pd.read_csv(folder, compression='gzip')
c.to_csv(path+folder[:-2]+"csv")
防止熊猫添加您不想要的东西
或者,您可以使用header=False, index_label=False
gzip
答案 3 :(得分:-1)
尝试以下代码:
import os, zipfile
dir_name = 'C:\\Users\\shedez\\Documents\\Project' # ZIP location
extract_dir_name = 'C:\\Users\\shedez\\Documents\\Project\\Unziped' # CSV location after unzip
extension = ".zip" # you might have to change this
os.chdir(dir_name) # change directory from working dir to dir with files
for item in os.listdir(dir_name): # loop through items in dir
if item.endswith(extension): # check for ".zip" extension
file_name = os.path.abspath(item) # get full path of files
zip_ref = zipfile.ZipFile(file_name) # create zipfile object
zip_ref.extractall(extract_dir_name) # extract file to dir
zip_ref.close() # close file
如果您想了解有关 zipFile 的更多信息,请单击here。