我有以下问题。 我有一个函数读取.txt文件并将其转换为字符串。但是,这样做会丢失文件的所有段落。 例如,如果我的.txt文件包含以下内容:
Hello everyone I have a problem with reading a file and turning it into a string.
This is a new paragraph, however it is lost once converted to a string.
And this is another paragraph as well.
现在读完这个.txt文件后,我得到以下字符串:
Hello everyone I have a problem with reading a file and turning it into a string.This is a new paragraph, however it is lost once converted to a string.And this is another paragraph as well.
意味着所有段落都已消失。
现在我读取此文件的命令是:
data = iom.read_file_contents(sys.argv[1])
和read_file_contents是以下模块中名为iom的函数:
import io
def read_file_contents(name):
return open(name).read()
def write_file_contents(name, text):
with io.open(name, 'w', encoding='utf-8') as outfile: #creates .txt file
outfile.write(unicode(text))
非常感谢任何帮助。 请求后,我的完整代码如下:
data = iom.read_file_contents(sys.argv[1])
for i in data:
if i not in string.ascii_letters and i not in n and i not in string.punctuation and i !=' ': #removes all non ascii, numbers, punctuation and ' ' characters
data = data.replace(i,"")
iom.write_file_contents(sys.argv[1],data) #rewrites the input .txt file by erasing all non ascii, numbers, punctuation and ' ' characters
output = sub.substitute(data, rotation)
iom.write_file_contents(sys.argv[2], output)
意思是我读了一个文件,我通过删除所有“怪异”字符来重写它,然后用输入字符串和字母将字母映射到其他字母(加密输入)来调用替换函数:
def substitute(str, cipher): #substitution cipher, takes a string (which will be substituted) and a dictionary
result = ""
n = '0123456789'
for c in str:
if c in string.uppercase or c in string.lowercase:
result = result + cipher[c]
elif c==' ' or c in n or c in string.punctuation:
result = result + c
return result
然后将替换函数的输出写入新的.txt文件。
答案 0 :(得分:0)
这也取代了换行符,你需要制作段落。
for i in data:
if i not in string.ascii_letters and i not in n and i not in string.punctuation and i !=' ': #removes all non ascii, numbers, punctuation and ' ' characters
data = data.replace(i,"")
虽然很难看,但这应该避免剥离换行符。 Håken的答案更好,因为它简化了对“坏”字符的搜索。
for i in data:
if i not in string.ascii_letters and i not in n and i not in string.punctuation and i !=' ' and i not in '\n': #removes all non ascii, numbers, punctuation and ' ' characters
data = data.replace(i,"")
答案 1 :(得分:0)
我通过删除所有“怪异”字符来重写它,如φ,
您还要删除除" "
相反怎么样?
letters = string.letters
non_letters = string.punctuation + string.digits + string.whitespace
for c in input_string:
if c in letters:
result += cipher[c]
elif c in non_letters:
result += c
如果你只想保留一些空格,你可以选择哪一个。
non_letters = string.punctuation + string.digits + ' ' + '\n'