Question

我试图找到PDF和DOC格式的简历到不同的目录，/PDF dir中的PDF文件和/DOCX dir中的DOC文件。我担心的是;

我的正则表达式是否正确用于查找PDF和DOC文件？简历的名称为，例如john right ResumeQA.doc，abcResumeC.doc，ShawnResume.pdf，johnright_ResumeQA.pdf
我没有在IDE或输出文件中获得任何计数或输出。

我想出的代码如下：

import os, sys, re

countpdf, countdoc = 0, 0

pdf = re.compile(r'\b\w*{resume}\w*\.[pdf]\b')
docx = re.compile(r'\b\w*{resume}\.[doc]\b]')

#os.mkdir(r'/Users/Desktop/Networking materials/PDF')
pdfdir = os.path.dirname(r'/Users/Desktop/Networking materials/PDF/')
print  pdfdir

#os.mkdir(r'/Users/Desktop/Networking materials/DOCX')
docxdir = os.path.dirname(r'/User/Desktop/Networking materials/DOCX/')
print docxdir

out = sys.stdout
with open('output.txt', 'w') as outfile:
     sys.stdout = outfile
     for rdir, directory, files in os.walk(r'/Users/Desktop/Networking materials/'):
         match1 = re.findall(pdf, str(files))
         print match1
         for items1 in match1:
             os.chdir(pdfdir)
             countpdf +=1
         print countpdf

         match2 = re.findall(docx, str(files))
         print match2
         for items2 in match2:
             os.chdir(docxdir)
             countdoc +=1
         print countdoc
         sys.stdout = out

到目前为止，唯一的输出是：

 /Users/Desktop/Networking materials/PDF
 /Users/Desktop/Networking materials/DOCX

你们中的任何人都可以更正我的代码，如果可能的话，请提出一种更有效的方法来完成这项任务。

Answer 1

不，你的正则表达式不正确你可以在python shell中轻松测试它们：

In [17]: a
Out[17]: 
[u'john right ResumeQA.doc',
 u' abcResumeC.doc',
 u' ShawnResume.pdf',
 u' johnright_ResumeQA.pdf']

In [20]: pdf = '\b\w*{resume}\w*\.[pdf]\b'

In [21]: for j in a:
    print re.findall(pdf, j)
   ....:     
[]
[]
[]
[]

因为你没有看到任何匹配。您应该使用一些正则表达式测试程序来检查您的正则表达式（例如this）。

我看到以下正则表达式：

pdf_re = ".+resume\w*\.pdf"
doc_re = ".+resume\w*\.doc"

只要你将re.I标志传递给regex，

应该是完全正常的，这将提示正则表达式引擎忽略大小写。上面的正则表达式pdf应匹配任何字符串开头有一些字符（点加），然后是字符串'resume'（大小写忽略），后跟0或更多单词，如字符（所以字母），后跟实际点（.dot）是特殊字符，因此需要转义），然后是字符串pdf。

re.findall(".+resume.*\.pdf", j, re.I)

闲逛你的其余代码。

此电话：sys.stdout = outfile不需要。如果您要写入文件只需使用outputfile.write(content)

您在match1 = re.findall(pdf, str(files))搜索文件的方式，这不是您想要继续的方式。 `files'包含文件列表，你想找到要移动的特定文件，你不想处理连接在一起的所有文件名。

接下来的事情：os.chdir实际上改变了工作目录，它不会改变文件的位置，也不会移动文件。要移动文件，请检查此question on SO

所以你需要做一些事情：

for rdir, directory, files in os.walk(r'/home/pawel/Documents'):
         for f in files:
             match = re.findall(pdf_re, f)
             if match:
                 matching_file = os.path.join(rdir, f)
                 target_location = os.path.join(pdfdir, f)
                 os.rename(matching_file, target_location)

使用python

1 个答案: