Question

在Python中：我正在尝试遍历目录中的文件，找到文件名中包含特定字符串的文件，打开并编辑这些文件。除了能够根据字符串选择目录中的特定文件外，它似乎都在工作：

import re
import datetime as dt
OldValue = input('Enter the value to be replaced: ')
NewValue = input('Enter the replacement value: ')
location = input('Enter path to directory: ')
directory = os.listdir(location)
os.chdir(location)
for root, dirs,files in os.walk('.'):
    for fname in files:
        re.match('PMPM', fname)

for f in os.listdir(location):
    for file in directory:
                open_file = open(file, 'r')
                read_file = open_file.read()
                regex = re.compile(OldValue)
                read_file = regex.sub(NewValue, read_file)
                write_file = open(file, 'w')
                write_file.write(read_file)
                now = dt.datetime.now()
                ago = now-dt.timedelta(minutes=30)
for root, dirs,files in os.walk('.'):
    for fname in files:
        path = os.path.join(root, fname)
st = os.stat(path)
mtime = dt.datetime.fromtimestamp(st.st_mtime)
if mtime > ago:
    print('%s modified %s' % (path,  mtime))

Answer 1

如果你想要的是一个包含给定子字符串的给定目录中的文件名列表，那么这样的东西应该有效：

#!python
import os
dir='.'         # Replace with path to your directory: absolute or relative
pattern = 'foo' # Replace with your target substring
matching_files = [f for f in os.listdir(dir) if pattern in f]

对于最简单的情况，这就是你所需要的。然后，您可以遍历 matching_files

列表

如果你想用os.walk()走下一个目录树，那么你必须从生成器返回的每个元组中搜索第三个项目。

os.walk（）向下递归一个树，返回每个子目录的元组。其中每个都包含三个项目：前导路径，下面的子目录列表，以及该节点上的文件名列表（除子目录以外的任何内容的目录条目）。

然而，还有一招！您需要为每个匹配前缀该级别的dirpath项目。换句话说，对于元组中的每个匹配（os.walk（...））[2]（列表），您需要将其与来自元组的相应字符串（os.walk（...））进行串联[ 0]获取匹配文件名的完整（绝对或相对）路径。

了解其工作原理的一种方法是加载Python解释器（最好是来自Jupyter项目的iPython），使用 walker = os.walk（dir）实例化一个生成器（其中） dir 是用作起点的任何有效目录），然后调用 this = next（walker），你可以查看 this [0] 和这[2] 然后继续查看 next（walker）。

让我们从使用简单子字符串匹配返回列表的代码开始（就像我之前的示例所做的那样，但为了清晰起见，在多行中）：

#!python
results = list()
dir = '.'
walker = os.walk(dir)
delimiter = os.path.sep
pattern = '.txt'
for p,_,f in walker:
  matches = ['%s%s%s' % (p, delimiter, x) for x in f if pattern in f]
  results.extend(matches)

在这种情况下，我正在使用 for 循环的元组解包来为我提供由 os.walk（）发电机。树中每个节点的匹配在列表推导中提取，列表推导也为每个匹配添加路径前缀（并使用 os.path.sep 使代码在不同的操作系统中可移植平台）。

还要注意_在Python中只是一个变量名，但它通常用于“丢弃”某些值。换句话说，在Python中使用_作为变量是读者和维护者的暗示，这是一些不需要的东西，你的代码以后不会对它们感兴趣。

将此作为生成器函数编写并产生结果而不是执行完整遍历（可能消耗时间和内存）会更好。使用我们自己的生成器包裹 os.walk（），我们可以更容易地处理每个匹配其他条件（找到第一个，第一个N，包裹更多的过滤，等等）。 / p>
我也在使用简单的子串匹配（使用Python的 in 运算符，它调用 ._ _contains_ _() 特殊方法。我们可以使用正则表达式。 ..虽然我赞扬对 re.match（）保持警惕，它只匹配与其匹配的每个字符串开头的模式。

所以这就是：

#!python import os, re def matchwalk(regex, directory): '''Yield path/filenames matching some regular expression ''' sep = os.path.sep pattern = re.compile(regex) for p,_,f in os.walk(directory): for each in f: if pattern.search(each): yield '{0}{1}{2}'.format(p,sep,each)

这与前面的代码示例类似。差异：我把它包装在一个函数中，我使用 yield 所以函数是一个生成器（就像 os.walk（））。我正在使用正则表达式;我更喜欢使用 re.compile（）以获得易读性（可能还有一些边际性能优势，但可能不会在大多数Python实现中，因为 re 模块通常会这样做正如Python对许多字符串的实习一样，正常表达式的实习）。此外，我正在使用较新的样式字符串格式化功能（虽然我个人更喜欢旧的语法;这只是为了启发）。

Answer 2

您可能希望查看标准unix style pathname pattern expansoin包，或只是glob。

在特定目录中运行文件名以'PMPM'开头的所有文件，例如'~/path/to/mydir'，这很简单：

import os
import glob

pattern = os.path.join(
    os.path.expanduser('~'),
    'path/to/mydir',
    'PMPM*' # mind the * here!
)

for matching_file in glob.glob(pattern):
    with open(matching_file, 'r') as f:
        # do something with the file object
        pass

或者简而言之：

from glob import glob
for mf in glob('home/someuser/path/to/mydir/PMPM*'):
    with open(mf, 'r') as f:
        pass # do something with f

如何循环遍历文件并通过文件名中的字符串识别文件

2 个答案: