我尝试使用正则表达式组更改一堆文件名,但似乎无法使其工作(尽管写了regexr.com告诉我应该是一个有效的正则表达式语句)。我目前拥有的93,000个文件都是这样的:
Mr. McCONNELL.2012-07-31.2014sep19_at_182325.txt
Mrs. HAGAN.2012-12-06.2014sep19_at_182321.txt
Ms. MURRAY.2012-06-18.2014sep19_at_182246.txt
The PRESIDING OFFICER.2012-12-06.2014sep19_at_182320.txt
我希望他们看起来像这样:
20120731McCONNELL2014sep19_at_182325.txt
并忽略任何以Mr.,Mrs.和Ms.
以外的任何文件开头的文件但每次我运行下面的脚本时,都会收到以下错误:
Traceback (most recent call last):
File "changefilenames.py", line 11, in <module>
date = m.group(2)
AttributeError: 'NoneType' object has no attribute 'group'
非常感谢你的帮助。如果这是一个愚蠢的问题,我道歉。我刚开始使用RegEx和Python,似乎无法想出这一点。
import io
import os
import re
from dateutil.parser import parse
for filename in os.listdir("/Users/jolijttamanaha/Desktop/thesis2/Republicans/CRspeeches"):
if filename.startswith("Mr."):
m = re.search("Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mr"
if filename.startswith("Mrs."):
m = re.search("Mrs.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mrs"
if filename.startswith("Ms."):
m = re.search("Ms.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mrs"
我已在Using Regex to Change Filenames with Python中进行了调整,但仍然没有运气。
编辑:根据以下答案做出以下更改:
for filename in os.listdir("/Users/jolijttamanaha/Desktop/thesis2/Republicans/CRspeeches"):
if filename.startswith("Mr."):
print filename
m = re.search("^Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
if m:
date = m.group(2)
name = m.group(1)
timestamp = m.group(3)
dt = parse(date)
new_filename = "{dt.year}.{dt.month}.{dt.day}".format(dt=dt) + name + timestamp + ".txt"
os.rename(filename, new_filename)
print new_filename
print "All done with the Mr"
它吐了出来:
Mr. Adams was right.2009-05-18.2014sep17_at_22240.txt
Mr. ADAMS.2009-12-16.2014sep18_at_223650.txt
Traceback (most recent call last):
File "changefilenames.py", line 19, in <module>
os.rename(filename, new_filename)
OSError: [Errno 2] No such file or directory
答案 0 :(得分:1)
进行搜索后,您始终希望确保在进行任何处理之前有匹配。看起来你可能有一个以&#39; Mr。&#39;开头的文件。但总的来说并不符合你的表达。
if filename.startswith("Mr."):
m = re.search("Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
if m: # Only look at groups if we have a match.
date = m.group(2)
name = m.group(1)
....
我还建议不要同时使用startswith('Mr.')
和正则表达式,因为你的正则表达式应该只适用于以&#39; Mr。&#39;开头的字符串,尽管你可能想要添加a&#39; ^&#39;到正则表达式的开头强制执行:
m = re.search("^Mr.\s(\w*).(\d\d\d\d\-\d\d\-\d\d).(\w*).txt", filename)
if m: # ^ added carat to signify start of string.
date = m.group(2)
name = m.group(1)
...
此外,您可能想要验证哪些文件不匹配,因为有了这么多数据,您将经常遇到诸如额外空格或不正确案例之类的问题,因此您可能需要考虑使您的正则表达式更加健壮。 / p>
答案 1 :(得分:1)
您正在将裸文件名传递给os.rename
,可能缺少路径。
考虑以下布局:
yourscript.py
subdir/
- one
- two
这与您的代码相似:
import os
for fn in os.listdir('subdir'):
print(fn)
os.rename(fn, fn + '_moved')
并抛出异常(在Python 3中稍微好一些):
FileNotFoundError: [Errno 2] No such file or directory: 'two' -> 'two_moved'
因为在当前工作目录中,没有名为two
的文件。但请考虑一下:
import os
for fn in os.listdir('subdir'):
print(fn)
os.rename(os.path.join('subdir',fn), os.path.join('subdir', fn+'_moved'))
这样可行,因为使用了完整路径。而不是一次又一次地使用'subdir'
(或在变量中),您应该更改工作目录作为第一步:
import os
os.chdir('subdir')
for fn in os.listdir():
print(fn)
os.rename(fn, fn + '_moved')