我有一个包含大量文件夹的目录,我想分别从每个文件夹中的所有文件创建一个笛卡尔列表。所以每个文件夹都会有自己的笛卡尔列表。
我可以为这样的文件夹执行此操作:
import pandas as pd
import os, glob, itertools
path =(r'C:\pathway')
allfiles = glob.glob(path + "/*.csv")
result = list(itertools.product(allfiles,allfiles))
我可以遍历所有文件夹中的所有文件,如下所示:
path =(r'C:\pathway')
for subdir, dirs, files in os.walk(path):
for file in files:
df=pd.read_csv(os.path.join(subdir,file))
但我不确定如何为每个文件夹中的文件制作单独的笛卡尔列表。
答案 0 :(得分:1)
如果要将方法应用于目录中的所有子文件夹,可以使用以下代码:
os.walk(<directory>)
y = next(os.walk('.'))[1]
directory = "/Users/bla/asd"
folders = os.walk(directory)
folders_arr = folders.next()[1]
results=[]
for folder_name in folders_arr:
path = directory + "/" + folder_name
allfiles = glob.glob(path)
results.append(list(itertools.product(allfiles,allfiles)))
答案 1 :(得分:1)
glob支持多个通配符,因此您可以通过执行以下操作来完成笛卡尔积:
from glob import glob
from os.path import join
from itertools import product
BASE_PATH = 'C:\pathway'
all_files = glob(join(BASE_PATH, '*', '*.csv')) # C:\pathway\*\*.csv
result = list(product(all_files, all_files))
来自docs(强调我的):
路径名可以是绝对的(如
/usr/src/Python-1.5/Makefile
)或相对的(如../../Tools/*/*.gif
),也可以包含shell样式的通配符
答案 2 :(得分:1)
将其分解为模块化组件将有助于解释未来类似的代码:
def flatFolders(rootPath):
'''
Given a root path (folder) containing deeply nested folders,
returns a dictionary {folder1:[files of folder1],
folder2:[files of folder2], ...}
'''
foldersToFiles = {}
...: # you can use recursions here, or os.walk(rootPath)
path = ...
files = ...
folders[path] = files
return foldersToFiles
def cartesianSelfProduct(lst):
'''
cartesianSelfProduct([1,2,3]) ->
[[1, 1], [1, 2], [1, 3], [2, 1], [2, 2], [2, 3], [3, 1], [3, 2], [3, 3]]
'''
return [(x,y) for x in lst for y in lst]
def flatFolderPairs(rootPath):
foldersToFiles = flatFolders(rootPath)
return {folder:cartesianSelfProduct(files) for folder,files in foldersToFiles}