我正在尝试使用python的“ glob”使用通配符而不是文件进入的路径来抓取各种文件。
在这种情况下,我试图捕获目录中所有以名称file_
开头的文件。尽管将来可能会出现一些情况,我需要从目录中基于文件扩展名(i.e. all .csv and .log)
的文件来获取文件。
下面是我正在使用的python字符串,它只能捕获FULL PATH以及所需的文件。我只想“遍历”文件本身,而不是“路径”
import os
import glob
import boto3
from botocore.client import Config
ACCESS_KEY_ID = 'some_key'
ACCESS_SECRET_KEY = 'some_key'
BUCKET_NAME = 'some_bucket'
s3 = boto3.client(
's3',
aws_access_key_id=ACCESS_KEY_ID,
aws_secret_access_key=ACCESS_SECRET_KEY,
config=Config(signature_version='s3v4')
)
csv_files = glob.glob('/home/user/folder1/folder2/*.csv')
#json_files = glob.glob("/home/user/folder1/h_log_*.json")
for filename in csv_files:
print("Putting %s" % filename)
s3.upload_file(filename, BUCKET_NAME, 'new_folder' + '/' + filename)
#for filename in json_files:
# print("Putting %s" % filename)
# s3.upload_file(filename, BUCKET_NAME, filename)
print("All_Finished")
####################################################
####################################################
The string I am trying to concentrate on updating from the script preferably is below:
csv_files = glob.glob('/home/user/folder1/folder2/*.csv')
An example of a file directory containing various files and file types :
Below need to grab all files that end in `.csv`
/home/user/Desktop/folder_example/
file_1.csv
file_1.csv
file_1.csv
file_1.csv
Below need to grab all files that start with `file_`
/home/user/Desktop/folder_example/
file_2.log
file_2.csv
file_2.log
file_2.csv
答案 0 :(得分:0)
如何使用os.path.basename
?
您可以将glob
与此功能结合使用以获得所需的内容:
[os.path.basename(item) for item in glob.glob("/home/user/folder1/folder2/*.csv")]
答案 1 :(得分:0)
您可以根据分隔符glob
或'/'
拆分'\'
输出,然后保留最后一部分。
import os
target_path = r"/home/user/folder1/folder2"
fpaths = glob.glob(target_path+os.sep+'*.csv')
[fp.split(os.sep)[-1] for fp in fpaths]
import glob, os
# Make Demo Files and a Demo Folder
target_path = os.path.join(os.getcwd(), 'temp_dump')
if not os.path.exists(target_path):
os.makedirs(target_path)
print(os.listdir(os.getcwd()))
file_names = ['file_{}.{}'.format(fnum, fext) for fnum in range(5) for fext in ['csv', 'txt', 'log']]
for file_name in file_names:
fpath = os.path.join(target_path, file_name)
with open(fpath, 'w') as f:
f.write(file_name)
print(sorted(os.listdir(target_path)))
输出:
['file_0.csv', 'file_0.log', 'file_0.txt',
'file_1.csv', 'file_1.log', 'file_1.txt',
'file_2.csv', 'file_2.log', 'file_2.txt',
'file_3.csv', 'file_3.log', 'file_3.txt',
'file_4.csv', 'file_4.log', 'file_4.txt']
.csv
个文件的文件名(无路径,仅是名称)fpaths = glob.glob(target_path+os.sep+'*.csv')
[fp.split(os.sep)[-1] for fp in fpaths]
输出
['file_0.csv', 'file_3.csv', 'file_2.csv', 'file_1.csv', 'file_4.csv']
答案 2 :(得分:0)
由于文件夹中只有两种类型的文件,因此您可以分别读取不同类型的文件。
csv_files = glob.glob( os.path.join('/home/user/Desktop/folder_example/', '*.csv') )
log_files = glob.glob( os.path.join('/home/user/Desktop/folder_example/', '*.log') )
答案 3 :(得分:0)
您可以将 pathlib 库用于 Python >= 3.5。 Path.glob()
返回一个生成器,您可以通过它进行迭代。
from pathlib import Path
path_generator = Path('/home/user/folder1/folder2').glob('*.csv')
[p.name for p in path_generator]
输出:
['file_0.csv',
'file_1.csv',
'file_2.csv',
'file_3.csv',
'file_4.csv']