我正在尝试创建一个脚本来列出给定目录中的所有目录,子目录和文件 我试过这个:
import sys,os
root = "/home/patate/directory/"
path = os.path.join(root, "targetdirectory")
for r,d,f in os.walk(path):
for file in f:
print os.path.join(root,file)
不幸的是,它无法正常工作 我得到了所有文件,但没有完整的路径。
例如,如果dir结构为:
/home/patate/directory/targetdirectory/123/456/789/file.txt
它会打印出来:
/home/patate/directory/targetdirectory/file.txt
我需要的是第一个结果。任何帮助将不胜感激!感谢。
答案 0 :(得分:175)
使用os.path.join
连接目录和文件名称:
for path, subdirs, files in os.walk(root):
for name in files:
print os.path.join(path, name)
请注意在并置中使用path
而不是root
,因为使用root
会不正确。
在Python 3.4中,添加了pathlib模块以便于路径操作。所以等同于os.path.join
将是:
pathlib.PurePath(path, name)
pathlib
的优点是您可以在路径上使用各种有用的方法。如果您使用具体的Path
变体,您还可以通过它们进行实际的OS调用,例如转到目录,删除路径,打开它指向的文件等等。
答案 1 :(得分:29)
以防万一...获取目录中的所有文件和子目录匹配某些模式(例如* .py):
import os
from fnmatch import fnmatch
root = '/some/directory'
pattern = "*.py"
for path, subdirs, files in os.walk(root):
for name in files:
if fnmatch(name, pattern):
print os.path.join(path, name)
答案 2 :(得分:7)
import os
[val for sublist in [[os.path.join(i[0], j) for j in i[2]] for i in os.walk('./')] for val in sublist]
# Meta comment to ease selecting text
最外面的val for sublist in ...
循环将列表展平为一维。 j
循环收集每个文件基名的列表,并将其连接到当前路径。最后,i
循环遍历所有目录和子目录。
此示例使用os.walk(...)
调用中的硬编码路径./
,您可以补充您喜欢的任何路径字符串。
注意:os.path.expanduser
和/或os.path.expandvars
可用于路径字符串,例如~/
很容易添加文件基名测试和目录名测试。
例如,测试*.jpg
个文件:
... for j in i[2] if j.endswith('.jpg')] ...
此外,不包括.git
目录:
... for i in os.walk('./') if '.git' not in i[0].split('/')]
答案 3 :(得分:5)
您应该在联接中使用'r'而不是'root'
答案 4 :(得分:3)
无法发表评论,请在此处写答案。这是我所见过的最清晰的一行:
import os
[os.path.join(path, name) for path, subdirs, files in os.walk(root) for name in files]
答案 5 :(得分:1)
你可以看看我制作的这个样本。它使用了不推荐使用的os.path.walk函数。使用列表来存储所有文件路径
root = "Your root directory"
ex = ".txt"
where_to = "Wherever you wanna write your file to"
def fileWalker(ext,dirname,names):
'''
checks files in names'''
pat = "*" + ext[0]
for f in names:
if fnmatch.fnmatch(f,pat):
ext[1].append(os.path.join(dirname,f))
def writeTo(fList):
with open(where_to,"w") as f:
for di_r in fList:
f.write(di_r + "\n")
if __name__ == '__main__':
li = []
os.path.walk(root,fileWalker,[ex,li])
writeTo(li)
答案 6 :(得分:0)
有点简单的单行:
import os
from itertools import product, chain
chain.from_iterable([["\\".join(w) for w in product([i[0]], i[2])] for i in os.walk(dir)])
答案 7 :(得分:0)
由于这里的每个示例都仅使用# MyApp/main/CMakeLists.txt
# build rule for executable myApp
add_executable(myApp
myApp.cc)
# dependencies
target_link_libraries(myApp
gui model)
(带有walk
),所以我想展示一个不错的示例并与join
进行比较:
listdir
如您所见,import os, time
def listFiles1(root): # listdir
allFiles = []; walk = [root]
while walk:
folder = walk.pop(0)+"/"; items = os.listdir(folder) # items = folders + files
for i in items: i=folder+i; (walk if os.path.isdir(i) else allFiles).append(i)
return allFiles
def listFiles2(root): # listdir/join (takes ~1.4x as long) (and uses '\\' instead)
allFiles = []; walk = [root]
while walk:
folder = walk.pop(0); items = os.listdir(folder) # items = folders + files
for i in items: i=os.path.join(folder,i); (walk if os.path.isdir(i) else allFiles).append(i)
return allFiles
def listFiles3(root): # walk (takes ~1.5x as long)
allFiles = []
for folder, folders, files in os.walk(root):
for file in files: allFiles+=[folder.replace("\\","/")+"/"+file] # folder+"\\"+file still ~1.5x
return allFiles
def listFiles4(root): # walk/join (takes ~1.6x as long) (and uses '\\' instead)
allFiles = []
for folder, folders, files in os.walk(root):
for file in files: allFiles+=[os.path.join(folder,file)]
return allFiles
for i in range(100): files = listFiles1("src") # warm up
start = time.time()
for i in range(100): files = listFiles1("src") # listdir
print("Time taken: %.2fs"%(time.time()-start)) # 0.28s
start = time.time()
for i in range(100): files = listFiles2("src") # listdir and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.38s
start = time.time()
for i in range(100): files = listFiles3("src") # walk
print("Time taken: %.2fs"%(time.time()-start)) # 0.42s
start = time.time()
for i in range(100): files = listFiles4("src") # walk and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.47s
版本效率更高。 (而且listdir
很慢)
答案 8 :(得分:0)
这只是一个补充,这样您就可以将数据转换为 CSV 格式
import sys,os
try:
import pandas as pd
except:
os.system("pip3 install pandas")
root = "/home/kiran/Downloads/MainFolder" # it may have many subfolders and files inside
lst = []
from fnmatch import fnmatch
pattern = "*.csv" #I want to get only csv files
pattern = "*.*" # Note: Use this pattern to get all types of files and folders
for path, subdirs, files in os.walk(root):
for name in files:
if fnmatch(name, pattern):
lst.append((os.path.join(path, name)))
df = pd.DataFrame({"filePaths":lst})
df.to_csv("filepaths.csv")
答案 9 :(得分:0)
非常简单的解决方案是运行几个子进程调用以将文件导出为 CSV 格式:
import subprocess
# Global variables for directory being mapped
location = '.' # Enter the path here.
pattern = '*.py' # Use this if you want to only return certain filetypes
rootDir = location.rpartition('/')[-1]
outputFile = rootDir + '_directory_contents.csv'
# Find the requested data and export to CSV, specifying a pattern if needed.
find_cmd = 'find ' + location + ' -name ' + pattern + ' -fprintf ' + outputFile + ' "%Y%M,%n,%u,%g,%s,%A+,%P\n"'
subprocess.call(find_cmd, shell=True)
该命令生成逗号分隔的值,可以在 Excel 中轻松分析。
f-rwxrwxrwx,1,cathy,cathy,2642,2021-06-01+00:22:00.2970880000,content-audit.py
生成的 CSV 文件没有标题行,但您可以使用第二个命令添加它们。
# Add headers to the CSV
headers_cmd = 'sed -i.bak 1i"Permissions,Links,Owner,Group,Size,ModifiedTime,FilePath" ' + outputFile
subprocess.call(headers_cmd, shell=True)
根据您返回的数据量,您可以使用 Pandas 进一步对其进行按摩。以下是我发现有用的一些内容,尤其是当您要处理多个级别的目录时。
将这些添加到您的导入中:
import numpy as np
import pandas as pd
然后将其添加到您的代码中:
# Create DataFrame from the csv file created above.
df = pd.read_csv(outputFile)
# Format columns
# Get the filename and file extension from the filepath
df['FileName'] = df['FilePath'].str.rsplit("/",1).str[-1]
df['FileExt'] = df['FileName'].str.rsplit('.',1).str[1]
# Get the full path to the files. If the path doesn't include a "/" it's the root directory
df['FullPath'] = df["FilePath"].str.rsplit("/",1).str[0]
df['FullPath'] = np.where(df['FullPath'].str.contains("/"), df['FullPath'], rootDir)
# Split the path into columns for the parent directory and its children
df['ParentDir'] = df['FullPath'].str.split("/",1).str[0]
df['SubDirs'] = df['FullPath'].str.split("/",1).str[1]
# Account for NaN returns, indicates the path is the root directory
df['SubDirs'] = np.where(df.SubDirs.str.contains('NaN'), '', df.SubDirs)
# Determine if the item is a directory or file.
df['Type'] = np.where(df['Permissions'].str.startswith('d'), 'Dir', 'File')
# Split the time stamp into date and time columns
df[['ModifiedDate', 'Time']] = df.ModifiedTime.str.rsplit('+', 1, expand=True)
df['Time'] = df['Time'].str.split('.').str[0]
# Show only files, output includes paths so you don't necessarily need to display the individual directories.
df = df[df['Type'].str.contains('File')]
# Set columns to show and their order.
df=df[['FileName','ParentDir','SubDirs','FullPath','DocType','ModifiedDate','Time', 'Size']]
filesize=[] # Create an empty list to store file sizes to convert them to something more readable.
# Go through the items and convert the filesize from bytes to something more readable.
for items in df['Size'].items():
filesize.append(convert_bytes(items[1]))
df['Size'] = filesize
# Send the data to an Excel workbook with sheets by parent directory
with pd.ExcelWriter("scripts_directory_contents.xlsx") as writer:
for directory, data in df.groupby('ParentDir'):
data.to_excel(writer, sheet_name = directory, index=False)
# To convert sizes to be more human readable
def convert_bytes(size):
for x in ['b', 'K', 'M', 'G', 'T']:
if size < 1024:
return "%3.1f %s" % (size, x)
size /= 1024
return size