我想在Windows上使用Python打开文件,执行一些正则表达式操作,可选择更改内容,然后将结果写回文件。
我可以创建一个看起来正确的示例文件(基于在SO和文档中的其他帖子中使用二进制模式的注释)。我无法看到的是我如何转换'二进制文件'将数据转换为可用的形式而不引入' \ r'字符。
一个例子:
import re
# Create an example file which represents the one I'm actually working on (a Jenkins config file if you're interested).
testFileName = 'testFile.txt'
with open(testFileName, 'wb') as output_file:
output_file.write(b'this\nis\na\ntest')
# Try and read the file in as I would in the script I was trying to write.
content = ""
with open(testFileName, 'rb') as content_file:
content = content_file.read()
# Do something to the content
exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content) # <-- Fails because it won't operate on 'binary data'
# Write the file back to disk and then realise, frustratingly that something in this process has introduced carriage returns onto every line.
outputFilename = 'output_'+testFileName
with open(outputFilename, 'wb') as output_file:
output_file.write(content)
答案 0 :(得分:2)
我认为你的意思是,你的文本文件有返回的车厢,你不希望它们包含在文本中。
如果你使用
with open(fileName, 'r', encoding="utf-8", errors="ignore", newline="\r\n") as content_file
或更具体地说,设置换行符=&#34; \ r \ n&#34;在你的公开电话中,它应该消耗新线路上的返回车厢。
编辑:或者如果您只想在\n
上操作,那么这个工作示例应该这样做。
import re
testFileName = 'testFile.txt'
with open(testFileName, 'w', newline='\n') as output_file:
output_file.write('this\nis\na\ntest')
content = ""
with open(testFileName, 'r', newline='\n') as content_file:
content = content_file.read()
exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content)
outputFilename = 'output_'+testFileName
with open(outputFilename, 'w', newline='\n') as output_file:
output_file.write(content)
答案 1 :(得分:1)
如果我正确解释了问题,我首先将字节解码为字符串,然后执行正则表达式子字符。接下来,我将字符串编码为要写入输出文件的字节。
import re
testFileName = 'testFile.txt'
with open(testFileName, 'wb') as output_file:
output_file.write(b'this\nis\na\ntest')
content = ""
with open(testFileName, 'rb') as content_file:
content = content_file.read().decode('utf-8')
exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content)
outputFilename = 'output_'+testFileName
with open(outputFilename, 'wb') as output_file:
output_file.write(content.encode('utf-8'))