Question

我想在Windows上使用Python打开文件，执行一些正则表达式操作，可选择更改内容，然后将结果写回文件。

我可以创建一个看起来正确的示例文件（基于在SO和文档中的其他帖子中使用二进制模式的注释）。我无法看到的是我如何转换＆＃39;二进制文件＆＃39;将数据转换为可用的形式而不引入＆＃39; \ r＆＃39;字符。

一个例子：

import re

# Create an example file which represents the one I'm actually working on (a Jenkins config file if you're interested).
testFileName = 'testFile.txt'
with open(testFileName, 'wb') as output_file:
    output_file.write(b'this\nis\na\ntest')

# Try and read the file in as I would in the script I was trying to write.
content = ""
with open(testFileName, 'rb') as content_file:
    content = content_file.read()

# Do something to the content
exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content) # <-- Fails because it won't operate on 'binary data'

# Write the file back to disk and then realise, frustratingly that something in this process has introduced carriage returns onto every line.
outputFilename = 'output_'+testFileName
with open(outputFilename, 'wb') as output_file:
    output_file.write(content)

Answer 1

我认为你的意思是，你的文本文件有返回的车厢，你不希望它们包含在文本中。

如果你使用 with open(fileName, 'r', encoding="utf-8", errors="ignore", newline="\r\n") as content_file

或更具体地说，设置换行符=＆＃34; \ r \ n＆＃34;在你的公开电话中，它应该消耗新线路上的返回车厢。

编辑：或者如果您只想在\n上操作，那么这个工作示例应该这样做。

import re

testFileName = 'testFile.txt'
with open(testFileName, 'w', newline='\n') as output_file:
    output_file.write('this\nis\na\ntest')

content = ""
with open(testFileName, 'r', newline='\n') as content_file:
    content = content_file.read()

exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content)

outputFilename = 'output_'+testFileName
with open(outputFilename, 'w', newline='\n') as output_file:
    output_file.write(content)

Answer 2

如果我正确解释了问题，我首先将字节解码为字符串，然后执行正则表达式子字符。接下来，我将字符串编码为要写入输出文件的字节。

import re

testFileName = 'testFile.txt'
with open(testFileName, 'wb') as output_file:
    output_file.write(b'this\nis\na\ntest')

content = ""
with open(testFileName, 'rb') as content_file:
    content = content_file.read().decode('utf-8')

exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content)

outputFilename = 'output_'+testFileName
with open(outputFilename, 'wb') as output_file:
    output_file.write(content.encode('utf-8'))

如何在Windows上读取/写入Python（3）中的文件而不引入回车符？

2 个答案: