在读取.text文件时保留段落

时间:2015-01-30 16:21:54

标签: python string io

我有以下问题。 我有一个函数读取.txt文件并将其转换为字符串。但是,这样做会丢失文件的所有段落。 例如,如果我的.txt文件包含以下内容:


Hello everyone I have a problem with reading a file and turning it into a string.

This is a new paragraph, however it is lost once converted to a string.

And this is another paragraph as well.

现在读完这个.txt文件后,我得到以下字符串:


Hello everyone I have a problem with reading a file and turning it into a string.This is a new paragraph, however it is lost once converted to a string.And this is another paragraph as well.

意味着所有段落都已消失。

现在我读取此文件的命令是:

data = iom.read_file_contents(sys.argv[1])

和read_file_contents是以下模块中名为iom的函数:

import io


def read_file_contents(name):             

    return open(name).read()


def write_file_contents(name, text):
    with io.open(name, 'w', encoding='utf-8') as outfile:  #creates .txt file

                outfile.write(unicode(text))

非常感谢任何帮助。 请求后,我的完整代码如下:

data = iom.read_file_contents(sys.argv[1])


for i in data:
    if i not in string.ascii_letters and i not in n and i not in string.punctuation and i !=' ': #removes all non ascii, numbers, punctuation and ' ' characters
        data = data.replace(i,"")


iom.write_file_contents(sys.argv[1],data)  #rewrites the input .txt file by erasing all non ascii, numbers, punctuation and ' ' characters
output = sub.substitute(data, rotation)
iom.write_file_contents(sys.argv[2], output)

意思是我读了一个文件,我通过删除所有“怪异”字符来重写它,然后用输入字符串和字母将字母映射到其他字母(加密输入)来调用替换函数:

def substitute(str, cipher):      #substitution cipher, takes a string (which will be substituted) and a dictionary


    result = ""
    n = '0123456789'
    for c in str:
        if c in string.uppercase or c in string.lowercase:
            result = result + cipher[c]
        elif c==' ' or c in n or c in string.punctuation:
            result = result + c

    return result

然后将替换函数的输出写入新的.txt文件。

2 个答案:

答案 0 :(得分:0)

这也取代了换行符,你需要制作段落。

for i in data:
    if i not in string.ascii_letters and i not in n and i not in string.punctuation and i !=' ': #removes all non ascii, numbers, punctuation and ' ' characters
        data = data.replace(i,"")

虽然很难看,但这应该避免剥离换行符。 Håken的答案更好,因为它简化了对“坏”字符的搜索。

for i in data:
    if i not in string.ascii_letters and i not in n and i not in string.punctuation and i !=' ' and i not in '\n': #removes all non ascii, numbers, punctuation and ' ' characters
        data = data.replace(i,"")

答案 1 :(得分:0)

  

我通过删除所有“怪异”字符来重写它,如φ,

您还要删除除" "

以外的所有空格

相反怎么样?

letters = string.letters
non_letters = string.punctuation + string.digits + string.whitespace

for c in input_string:
    if c in letters:
        result += cipher[c]
    elif c in non_letters:
        result += c

如果你只想保留一些空格,你可以选择哪一个。

non_letters = string.punctuation + string.digits + ' ' + '\n'