我试图编写一个简单的程序,在给定目录中打开文本文件,搜索与给定模式匹配的所有字符串,并在删除所有其他信息时将其替换为所需的字符串。我有两个.txt文件:
User_321.txt,其中包含:
321_AliceKelly001.jpg [size_info] [date_info] [geo_location_info] ... [other info]
321_AliceKelly002.jpg [size_info] [date_info] [geo_location_info] ... [other info]
321_AliceKelly003.jpg [size_info] [date_info] [geo_location_info] ... [other info]
...
321_AliceKelly125.jpg [size_info] [date_info] [geo_location_info] ... [other info]
和User_205.txt包含:
205_CarlCarlson001.jpg [size_info] [date_info] [geo_location_info] ... [other info]
205_CarlCarlson002.jpg [size_info] [date_info] [geo_location_info] ... [other info]
205_CarlCarlson_003.jpg [size_info] [date_info] [geo_location_info] ... [other info]
205_CarlCarlson007.jpg [size_info] [date_info] [geo_location_info] ... [other info]
我希望User_321.txt包含:
321_AliceKelly_001.jpg
321_AliceKelly_002.jpg
321_AliceKelly_003.jpg
...
321_AliceKelly_125.jpg
和User_205.txt包含:
205_CarlCarlson_001.jpg
205_CarlCarlson_002.jpg
205_CarlCarlson_003.jpg
205_CarlCarlson_007.jpg
所以我只想添加" _"名称和后3位数字之间。我能够处理所有条目都是统一的情况,只包含以下形式的条目:
\d\d\d_[a-zA-Z]\d\d\d.jpg [size_info] [date_info] [geo_location_info] ... [other info]
使用以下代码:
import os, re,
path = 'C:\\Users\\ME\\Desktop\\TEST'
text_files = [filename for filename in os.listdir(path)]
desired_text = re.compile(r'\w+.jpg')
#desired_ending = re.compile(r'$[a-zA-Z]\d\d\d.jpg')
for i in range(len(text_files)):
working_file = path + '\\' + text_files[i]
fin = open(working_file, 'r')
match = ''
for line in fin:
mo1 = desired_text.search(line)
if mo1 != '':
match += mo1.group()[:-7] + '_' + mo1.group()[-7:]+'\n'
fin.close()
fout = open(working_file, 'w')
fout.write(match)
fout.close()
我在第二种情况下遇到困难,就是当我有一个已经处于所需形式的条目时,例如:
205_CarlCarlson_003.jpg [size_info] [date_info] [geo_location_info] ... [other info]
205_CarlCarlson007.jpg [size_info] [date_info] [geo_location_info] ... [other info].
我希望它跳过重命名已经处于所需形式的条目并继续其余的条目。
我查看了How to search and replace text in a file using Python?和Cheap way to search a large text file for a string以及Search and replace a line in a file in Python。这些情况似乎与搜索特定字符串有关,并使用fileinput模块将其替换为另一个字符串。我想做类似的事情,但在搜索方面要灵活得多。
答案 0 :(得分:1)
您可以使用parentheses for grouping and capturing
\b(\d{3}_[a-zA-Z]+)(\d{3}\.jpg)
并替换为\1_\2
以在其间添加下划线。
\b
匹配word boundary See demo at regex101(Python代码生成器)
答案 1 :(得分:1)
我稍微修改了你的代码,处理了两种不同的情况,它似乎有效:
import os, re
path = 'C:\\Users\\ME\\Desktop\\TEST'
text_files = [filename for filename in os.listdir(path)]
desired_text1 = re.compile(r'^\d{3}_[a-zA-Z]+\d{3}.jpg')
desired_text2 = re.compile(r'^\d{3}_[a-zA-Z]+_\d{3}.jpg')
for i in range(len(text_files)):
working_file = path + '\\' + text_files[i]
fin = open(working_file, 'r')
match = ''
for line in fin:
mo1 = desired_text1.search(line)
mo2 = desired_text2.search(line)
if mo1:
match += mo1.group()[:-7] + '_' + mo1.group()[-7:]+'\n'
elif mo2:
match += mo2.group() +'\n'
fin.close()
fout = open(working_file, 'w')
fout.write(match)
fout.close()
答案 2 :(得分:0)
你可以这样做:
with open('source.txt') as f:
with open('destination.txt', 'w') as g:
for line in f:
parts = line.split(None, 1)
if parts[0][-8:-7] == '_':
g.write(parts[0] + '\n')
else:
g.write(parts[0][:-7] + '_' + parts[0][-7:] + '\n')
如果您想要Windows换行序列,请随时将\n
更改为\r\n
。