Question

我有以下问题。我有一个函数读取.txt文件并将其转换为字符串。但是，这样做会丢失文件的所有段落。例如，如果我的.txt文件包含以下内容：

Hello everyone I have a problem with reading a file and turning it into a string.

This is a new paragraph, however it is lost once converted to a string.

And this is another paragraph as well.

现在读完这个.txt文件后，我得到以下字符串：

Hello everyone I have a problem with reading a file and turning it into a string.This is a new paragraph, however it is lost once converted to a string.And this is another paragraph as well.

意味着所有段落都已消失。

现在我读取此文件的命令是：

data = iom.read_file_contents(sys.argv[1])

和read_file_contents是以下模块中名为iom的函数：

import io


def read_file_contents(name):             

    return open(name).read()


def write_file_contents(name, text):
    with io.open(name, 'w', encoding='utf-8') as outfile:  #creates .txt file

                outfile.write(unicode(text))

非常感谢任何帮助。请求后，我的完整代码如下：

data = iom.read_file_contents(sys.argv[1])


for i in data:
    if i not in string.ascii_letters and i not in n and i not in string.punctuation and i !=' ': #removes all non ascii, numbers, punctuation and ' ' characters
        data = data.replace(i,"")


iom.write_file_contents(sys.argv[1],data)  #rewrites the input .txt file by erasing all non ascii, numbers, punctuation and ' ' characters
output = sub.substitute(data, rotation)
iom.write_file_contents(sys.argv[2], output)

意思是我读了一个文件，我通过删除所有“怪异”字符来重写它，然后用输入字符串和字母将字母映射到其他字母（加密输入）来调用替换函数：

def substitute(str, cipher):      #substitution cipher, takes a string (which will be substituted) and a dictionary


    result = ""
    n = '0123456789'
    for c in str:
        if c in string.uppercase or c in string.lowercase:
            result = result + cipher[c]
        elif c==' ' or c in n or c in string.punctuation:
            result = result + c

    return result

然后将替换函数的输出写入新的.txt文件。

Answer 1

这也取代了换行符，你需要制作段落。

for i in data:
    if i not in string.ascii_letters and i not in n and i not in string.punctuation and i !=' ': #removes all non ascii, numbers, punctuation and ' ' characters
        data = data.replace(i,"")

虽然很难看，但这应该避免剥离换行符。 Håken的答案更好，因为它简化了对“坏”字符的搜索。

for i in data:
    if i not in string.ascii_letters and i not in n and i not in string.punctuation and i !=' ' and i not in '\n': #removes all non ascii, numbers, punctuation and ' ' characters
        data = data.replace(i,"")

Answer 2

我通过删除所有“怪异”字符来重写它，如φ，

您还要删除除" "

以外的所有空格

相反怎么样？

letters = string.letters
non_letters = string.punctuation + string.digits + string.whitespace

for c in input_string:
    if c in letters:
        result += cipher[c]
    elif c in non_letters:
        result += c

如果你只想保留一些空格，你可以选择哪一个。

non_letters = string.punctuation + string.digits + ' ' + '\n'

在读取.text文件时保留段落

2 个答案: