Question

我想循环浏览文件夹中具有特定扩展名的文件，在本例中为.txt，打开文件，并打印正则表达式模式的匹配项。但是，当我运行我的程序时，它只打印文件夹中两个文件中的一个文件的结果：

安东尼对学校来说太酷了。我报告了罪犯。我很酷。

1: A, I, R, I, C

我的第二个文件包含文字：

哦我的首字母是AK

最后我的代码：

import re, os

Regex = re.compile(r'[A-Z]')
filepath =input('Enter a folder path: ')
files = os.listdir(filepath)
count = 0

for file in files:
    if '.txt' not in file:
        del files[files.index(file)]
        continue
    count += 1
    fileobj = open(os.path.join(filepath, file), 'r')
    filetext = fileobj.read()
    Matches = Regex.findall(filetext)
    print(str(count)+': ' +', '.join(Matches), end = ' ')
    fileobj.close()

有没有办法循环（并打开）文件列表？是因为我将open(os.path.join(filepath, file), 'r')返回的每个文件对象分配给同一个名称fileobj？

Answer 1

U可以这样简单:(它只是一个循环文件）

import re, os

Regex = re.compile(r'[A-Z]')
filepath =input('Enter a folder path: ')
files = os.listdir(filepath)
count = 0

for file in files:
    if '.txt' in file:
        fileobj = open(os.path.join(filepath, file), 'r')
        filetext = fileobj.read()
        Matches = Regex.findall(filetext)
        print(str(count)+': ' +', '.join(Matches), end == ' ')
        fileobj.close()

Answer 2

del导致问题。 for循环不知道你是否删除了一个元素，所以它总是前进。目录中可能存在隐藏文件，它是文件中的第一个元素。删除后，for循环会跳过其中一个文件，然后读取第二个文件。要进行验证，您可以在每个循环开始时打印files和file。简而言之，删除del行应解决问题。

如果这是一个独立的脚本，bash可能会更干净：

count=0
for file in "$1"/*.txt; do
    echo -n "${count}: $(grep -o '[A-Z]' "$file" | tr "\n" ",") "
    ((count++))
done

Answer 3

glob模块将为您提供更多帮助，因为您想要阅读具有特定扩展名的文件。

您可以直接获取扩展名为＆＃34; txt＆＃34;的文件列表即你保存了一个＆＃39; if＆＃39; 构造。

有关glob module的更多信息。

代码将越来越不可读。

import glob

for file_name in glob.glob(r'C:/Users/dinesh_pundkar\Desktop/*.txt'):
    with open(file_name,'r') as f:
        text = f.read()
        """
        After this you can add code for Regex matching,
        which will match pattern in file text.
        """

循环（并打开）文件夹中特定类型的文件？

3 个答案: