使用正则表达式使用Python更改文件名

时间:2015-01-26 15:50:33

标签: python regex file

我尝试使用正则表达式组更改一堆文件名,但似乎无法使其工作(尽管写了regexr.com告诉我应该是一个有效的正则表达式语句)。我目前拥有的93,000个文件都是这样的:

Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt    
Mrs. HAGAN.2012-12-06.2014sep19_at_182321.txt
Ms. MURRAY.2012-06-18.2014sep19_at_182246.tx

我希望他们看起来像这样:

20120731McCONNELL2014sep19_at_182325.txt

但每次我运行下面的脚本时,都会收到以下错误:

Traceback (most recent call last):
  File "changefilenames.py", line 11, in <module>
    date = m.group(2)
AttributeError: 'NoneType' object has no attribute 'group'

非常感谢你的帮助。如果这是一个愚蠢的问题,我道歉。我刚开始使用RegEx和Python,似乎无法解决这个问题。

import os
import re
from dateutil.parser import parse


for filename in os.listdir("."):
    if filename.startswith("Mr."):

        m = re.match("Mr.\s(\w*).(\d*-\d*-\d*).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}{dt.month}{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mr"

    if filename.startswith("Mrs."):

        m = re.match("Ms.\s(\w*).(\d*-\d*-\d*).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}{dt.month}{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mrs"

    if filename.startswith("Ms."):

        m = re.match("Mrs.\s(\w*).(\d*-\d*-\d*).(\w*).txt", filename)
        date = m.group(2)
        name = m.group(1)
        timestamp = m.group(3)

        dt = parse(date)
        new_filename = "{dt.year}{dt.month}{dt.day}".format(dt=dt) + name + timestamp + ".txt"

        os.rename(filename, new_filename)
        print new_filename

    print "All done with the Mrs" 

EDIT 我根据以下建议更改了脚本,但仍然得到完全相同的错误。这是新脚本:

for filename in os.listdir("."):

    m = re.search("(Mr|Mrs|Ms)\.\s(\w*)\.(\d*\-\d*\-\d*)\.(\w*)\.txt", filename)
    date = m.group(2)
    name = m.group(1)
    timestamp = m.group(3)

    dt = parse(date)
    new_filename = "{dt.year}{dt.month}{dt.day}".format(dt=dt) + name + timestamp + ".txt"

    os.rename(filename, new_filename)
    print new_filename

4 个答案:

答案 0 :(得分:0)

您必须使用re.search代替re.match,有关详细信息,请阅读search() vs. match()

>>> s="Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt "
>>> import re
>>> m = re.search("Mr.\s(\w*).(\d*-\d*-\d*).(\w*).txt", s)

>>> date = m.group(2)
>>> date
'2012-07-31'
>>> name = m.group(1)
>>> name
'McCONNELL'
>>> timestamp = m.group(3)
>>> timestamp
'2014sep19_at_182325'

答案 1 :(得分:0)

以下是我对正则表达式的建议。

  1. 转义特殊字符(句号和破折号)。
  2. 通过对前缀进行分组来合并正则表达式。
  3. 对数字进行分组,以便稍后按组检索。

    (Mr|Mrs|Ms)\.\s(\w*)\.(\d*)\-(\d*)\-(\d*)\.(\w*)\.txt

答案 2 :(得分:0)

re.sub(r'^Mrs?\. (\w+)\.(\d{4})-(\d{2})-(\d{2})\.(\d{4}\w+\d+_at_\d+)(\.txt)$',r'\2\3\4\1\5\6','Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt')

答案 3 :(得分:0)

我做了这样的改造(免责声明,我根本没有清理过这个):

import re

from pprint import pprint

names = """
Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt
Mrs. HAGAN.2012-12-06.2014sep19_at_182321.txt
Ms. MURRAY.2012-06-18.2014sep19_at_182246.txt
""".strip()

for record in names.splitlines():
    name, part2 = re.split('\.(?=\d)', record, 1)
    date, at_time, fileext = re.split('\.', part2)

    pprint(record)
    pprint(''.join([
        date.replace('-', ''),
        name.translate(None, ' .',),
        at_time,
    ]) + '.' + fileext)


    print('\n')