Question

很长一段时间潜伏着，但从未在这里发布过。对不起，如果这不是一个好帖子...我制作了一个程序，使用正则表达式从简历中提取姓名和电子邮件。我可以让它在我的简历文件夹中打开一个特定的文件，但让程序迭代文件夹中的所有文件让我难过。这是我正在做的伪代码：

打开简历文件夹
- 读取file1.txt
  - 执行nameFinder
  - 执行emailFinder
    - 创建新词典candidateData
    - 导出到Excel
- 读取file2.txt
- ...

以下是代码：

import re
import os
import pprint

with open('John Doe -Resume.txt', 'r') as f:

    #This pulls the first line of the resume,
    #Which is generally the name.
    first_line_name = f.readline().strip()

    #This pulls the Email from the resume.
    bulkemails = f.read()
    r = re.compile(r'(\b[\w.]+@+[\w.]+.+[\w.]\b)')
    candidateEmail = r.findall(bulkemails)
    emails = ""
    for x in candidateEmail:
            emails += str(x)+"\n"

            #This creates the dictionary data
            candidateData = {'candidateEmail' : str(candidateEmail), \
                              'candidateName' : str(first_line_name)}

    pprint.pprint(candidateData)

然后，我把它作为输出：

{'candidateEmail': "['JohnDoe@gmail.com']",
'candidateName': 'John Doe'}

所有准备好导出到Excel。

所以，这就是“我的问题！如何让我的简历文件夹中的所有.txt文件都这样做，而不仅仅是我指定的文件？此外，任何鳕鱼评论都会非常感激，谢谢你们！：D

Answer 1

您可以使用glob迭代目录中的所有.txt文件，然后在每个文件上运行该函数。将其添加到开头

import re
import os
import glob
import pprint

os.chdir("resumes")
for file in glob.glob("*.txt"):
    with open(file, 'r') as f:
        #Rest of your execution code here

编辑：在评论中回答你的问题：

import re
import os
import glob
import pprint

candidateDataList = []
for file in glob.glob("*.txt"):
    with open(file, 'r') as f:

        #This pulls the first line of the resume,
        #Which is generally the name.
        first_line_name = f.readline().strip()

        #This pulls the Email from the resume.
        bulkemails = f.read()
        r = re.compile(r'(\b[\w.]+@+[\w.]+.+[\w.]\b)')
        candidateDataList.append({'name':str(first_line_name),
                                  'email':r.findall(bulkemails)})

pprint.pprint(candidateDataList)

Answer 2

@ Jakob的回答是现场的。我只想提一个我通常喜欢的好方法，pathlib：

import re
import pprint
from pathlib import Path

resumes_dir = Path("resumes")
for path in resumes_dir.glob("*.txt"):
    with path.open() as f:
        #Rest of your execution code here

在文件夹中的多个文件上运行数据解析器？蟒蛇

2 个答案: