在Python中使用正则表达式更改文件名时获取NoneType错误

时间:2015-02-01 21:00:24

标签: python regex filenames

我尝试使用正则表达式组更改一堆文件名,但似乎无法使其工作(尽管写了regexr.com告诉我应该是一个有效的正则表达式语句)。我目前拥有的93,000个文件都是这样的:

Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt    
Mrs. HAGAN.2012-12-06.2014sep19_at_182321.txt
Ms. MURRAY.2012-06-18.2014sep19_at_182246.txt
The PRESIDING OFFICER.2012-12-06.2014sep19_at_182320.txt

我希望他们看起来像这样:

20120731McCONNELL2014sep19_at_182325.txt

并忽略任何以Mr.,Mrs.和Ms.

以外的任何文件开头的文件

但每次我运行下面的脚本时,都会收到以下错误:

Traceback (most recent call last):
  File "changefilenames.py", line 11, in <module>
    date = m.group(2)
AttributeError: 'NoneType' object has no attribute 'group'

非常感谢你的帮助。如果这是一个愚蠢的问题,我道歉。我刚开始使用RegEx和Python,似乎无法想出这一点。

import io
import os
import re
from dateutil.parser import parse


for filename in os.listdir("/Users/jolijttamanaha/Desktop/thesis2/Republicans/CRspeeches"):
    if filename.startswith("Mr."):

        m = re.search("Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mr"

    if filename.startswith("Mrs."):

        m = re.search("Mrs.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mrs"

    if filename.startswith("Ms."):

        m = re.search("Ms.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mrs"

我已在Using Regex to Change Filenames with Python中进行了调整,但仍然没有运气。

编辑:根据以下答案做出以下更改:

for filename in os.listdir("/Users/jolijttamanaha/Desktop/thesis2/Republicans/CRspeeches"):
    if filename.startswith("Mr."):
        print filename
        m = re.search("^Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
        if m:
            date = m.group(2)
            name = m.group(1)
            timestamp = m.group(3)

            dt = parse(date)
            new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"

            os.rename(filename, new_filename)
            print new_filename

print "All done with the Mr"

它吐了出来:

Mr. Adams was right.2009-05-18.2014sep17_at_22240.txt
Mr. ADAMS.2009-12-16.2014sep18_at_223650.txt
Traceback (most recent call last):
  File "changefilenames.py", line 19, in <module>
    os.rename(filename, new_filename)
OSError: [Errno 2] No such file or directory

2 个答案:

答案 0 :(得分:1)

进行搜索后,您始终希望确保在进行任何处理之前有匹配。看起来你可能有一个以&#39; Mr。&#39;开头的文件。但总的来说并不符合你的表达。

if filename.startswith("Mr."):

    m = re.search("Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
    if m: # Only look at groups if we have a match.
        date = m.group(2)
        name = m.group(1)
        ....

我还建议不要同时使用startswith('Mr.')和正则表达式,因为你的正则表达式应该只适用于以&#39; Mr。&#39;开头的字符串,尽管你可能想要添加a&#39; ^&#39;到正则表达式的开头强制执行:

m = re.search("^Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
if m:        # ^ added carat to signify start of string.
    date = m.group(2)
    name = m.group(1)
    ...

此外,您可能想要验证哪些文件不匹配,因为有了这么多数据,您将经常遇到诸如额外空格或不正确案例之类的问题,因此您可能需要考虑使您的正则表达式更加健壮。 / p>

答案 1 :(得分:1)

您正在将裸文件名传递给os.rename,可能缺少路径。

考虑以下布局:

yourscript.py
subdir/
  - one
  - two

这与您的代码相似:

import os

for fn in os.listdir('subdir'):
    print(fn)
    os.rename(fn, fn + '_moved')

并抛出异常(在Python 3中稍微好一些):

FileNotFoundError: [Errno 2] No such file or directory: 'two' -> 'two_moved'

因为在当前工作目录中,没有名为two的文件。但请考虑一下:

import os

for fn in os.listdir('subdir'):
    print(fn)
    os.rename(os.path.join('subdir',fn), os.path.join('subdir', fn+'_moved'))

这样可行,因为使用了完整路径。而不是一次又一次地使用'subdir'(或在变量中),您应该更改工作目录作为第一步:

import os

os.chdir('subdir')

for fn in os.listdir():
    print(fn)
    os.rename(fn, fn + '_moved')