如何在Windows上读取/写入Python(3)中的文件而不引入回车符?

时间:2015-09-14 14:32:15

标签: python regex file python-3.x

我想在Windows上使用Python打开文件,执行一些正则表达式操作,可选择更改内容,然后将结果写回文件。

我可以创建一个看起来正确的示例文件(基于在SO和文档中的其他帖子中使用二进制模式的注释)。我无法看到的是我如何转换'二进制文件'将数据转换为可用的形式而不引入' \ r'字符。

一个例子:

import re

# Create an example file which represents the one I'm actually working on (a Jenkins config file if you're interested).
testFileName = 'testFile.txt'
with open(testFileName, 'wb') as output_file:
    output_file.write(b'this\nis\na\ntest')

# Try and read the file in as I would in the script I was trying to write.
content = ""
with open(testFileName, 'rb') as content_file:
    content = content_file.read()

# Do something to the content
exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content) # <-- Fails because it won't operate on 'binary data'

# Write the file back to disk and then realise, frustratingly that something in this process has introduced carriage returns onto every line.
outputFilename = 'output_'+testFileName
with open(outputFilename, 'wb') as output_file:
    output_file.write(content)

2 个答案:

答案 0 :(得分:2)

我认为你的意思是,你的文本文件有返回的车厢,你不希望它们包含在文本中。

如果你使用         with open(fileName, 'r', encoding="utf-8", errors="ignore", newline="\r\n") as content_file

或更具体地说,设置换行符=&#34; \ r \ n&#34;在你的公开电话中,它应该消耗新线路上的返回车厢。

编辑:或者如果您只想在\n上操作,那么这个工作示例应该这样做。

import re

testFileName = 'testFile.txt'
with open(testFileName, 'w', newline='\n') as output_file:
    output_file.write('this\nis\na\ntest')

content = ""
with open(testFileName, 'r', newline='\n') as content_file:
    content = content_file.read()

exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content)

outputFilename = 'output_'+testFileName
with open(outputFilename, 'w', newline='\n') as output_file:
    output_file.write(content)

答案 1 :(得分:1)

如果我正确解释了问题,我首先将字节解码为字符串,然后执行正则表达式子字符。接下来,我将字符串编码为要写入输出文件的字节。

import re

testFileName = 'testFile.txt'
with open(testFileName, 'wb') as output_file:
    output_file.write(b'this\nis\na\ntest')

content = ""
with open(testFileName, 'rb') as content_file:
    content = content_file.read().decode('utf-8')

exampleRegex = re.compile("a\\ntest")
content = exampleRegex.sub("a\\nworking\\ntest", content)

outputFilename = 'output_'+testFileName
with open(outputFilename, 'wb') as output_file:
    output_file.write(content.encode('utf-8'))