我目前正在使用一些代码,每年以每月的方式在zip文件中下载csv数据,然后将文件下载并存储如下:
目前这些文件夹都只是在我的桌面
一旦我点击说文件夹{{1}},您就可以看到每个月,一月,二月等的文件夹...
到目前为止,我已经尝试过:
substring()
但它似乎没有工作?
任何帮助将不胜感激。
答案 0 :(得分:1)
不幸的是,我没有使用zip模块的经验,但是如果您问如何导航到这些文件夹中的每一个,我都会遇到类似的问题:
import os
import zipfile
main_file = 'C:\\Users\\Folder1' #wherever you have saved all this data in full path form
os.chdir(main_file) # Load program into top level
os.mkdir('OUTPUT') # make a folder to save output
try:
for i in range(2010, 2016 + 1): # for years 2010-2016
os.chdir(str(i))
for j in range(1, 12+1): # months 1-12
os.chdir('MMSDM_{0}_{1:02d}'.format(i, j))
os.chdir('MMSDM_Historical_Data_SQLLoader/DATA')
z = zipfile.ZipFile('PUBLIC_*.zip')
# do stuff with zip file here
os.chdir(main_file)
os.chdir('OUTPUT')
with open('FileNameUsingIorJ.csv/zip/SomeOtherExtension', 'w+') as file:
file.write(zipfile_data)
os.chdir(main_file) # reset for next loop
except Exception as e:
print('Exception occurred: {}'.format(e))
我无法验证它是否有效,因为我的PC上显然没有文件,并且仍然有些空白,例如“在这里做东西”,但希望这可以帮助您步入正轨!让我知道是否需要进一步说明。
答案 1 :(得分:1)
这似乎比 zipfile 与文件系统遍历更为相关。为此,您可以使用[Python 3]: glob - Unix style pathname pattern expansion,并使用[Python 3]: zipfile - Work with ZIP archives处理{em> .zip 文件。
有关遍历目录的更多详细信息,请检查[SO]: How do I list all files of a directory? (@CristiFati's answer)。
code.py :
#!/usr/bin/env python3
import sys
import os
import glob
import zipfile
INPUT_DIR = ".\\InDir"
OUTPUT_DIR = ".\\OutDir"
def get_zip_files(path, start_pattern): # Python 3.5 + !!!
return glob.iglob(os.path.join(INPUT_DIR, os.path.join("**", start_pattern + "*.zip")), recursive=True)
def main():
for item in get_zip_files(INPUT_DIR, "PUBLIC_"):
print("Found .zip file that matches pattern: {:s}".format(item))
zf = zipfile.ZipFile(item)
for name in zf.namelist():
if name.lower().endswith(".csv"):
print(" Extracting {:s}".format(name))
zf.extract(name, path=OUTPUT_DIR)
if __name__ == "__main__":
print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
main()
注释:
输出:
e:\Work\Dev\StackOverflow\q054498244>dir /b code.py InDir OutDir e:\Work\Dev\StackOverflow\q054498244>dir /b /s InDir e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0 e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0\Dir00 e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0\Dir01 e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0\Dir00\OTHER_FILE.zip e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0\Dir00\PUBLIC_DVD_DISPATCH_UNIT_SCDATA_00.zip e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0\Dir01\PUBLIC_DVD_DISPATCH_UNIT_SCDATA_01.zip e:\Work\Dev\StackOverflow\q054498244>dir /b OutDir e:\Work\Dev\StackOverflow\q054498244>"e:\Work\Dev\VEnvs\py_064_03.06.08_test0\Scripts\python.exe" code.py Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] on win32 Found .zip file that matches pattern: .\InDir\Dir0\Dir00\PUBLIC_DVD_DISPATCH_UNIT_SCDATA_00.zip Extracting PUBLIC_DVD_DISPATCH_UNIT_SCDATA_0.csv Found .zip file that matches pattern: .\InDir\Dir0\Dir01\PUBLIC_DVD_DISPATCH_UNIT_SCDATA_01.zip Extracting PUBLIC_DVD_DISPATCH_UNIT_SCDATA_1.csv e:\Work\Dev\StackOverflow\q054498244>dir /b OutDir PUBLIC_DVD_DISPATCH_UNIT_SCDATA_0.csv PUBLIC_DVD_DISPATCH_UNIT_SCDATA_1.csv
@ EDIT0 :
要获得 Python 2 的兼容性,只需将 get_zip_files 替换为以下版本:
def get_zip_files(path, start_pattern):
start_pattern_lower = start_pattern.lower()
entries = os.listdir(path)
for entry in entries:
entry_lower = entry.lower()
entry_with_path = os.path.join(path, entry)
if os.path.isdir(entry_with_path):
for sub_entry in get_zip_files(entry_with_path, start_pattern):
yield sub_entry
else:
if entry_lower.startswith(start_pattern_lower) and entry_lower.endswith(".zip"):
yield entry_with_path