组合行(删除行号)从python中的文本文件中创建段落

时间:2014-12-23 16:22:51

标签: python

我有一些相当不寻常的文字,内容如下:

[1]  It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, co
[2]  Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover .
[3]  Limit of Liability and Disclaimer of Warranty: e authors have used their best e orts in preparing this book, and the information provided herein as is. e information provided is sold without warranty, either express or implied.
[4]  Neither the authors nor Cartwheel Web will be held liable for any damages to be caused either directly or indirectly by the contents of this book.
[5]  Trademarks: Rather than indicating every occurence of a trademarked name as such, this book uses the names only in an editorial fashion and to the bene t of the trademark owner with no intention of infringement of the trademark.

..即括号中的行号,后跟行。

通常情况下,我会这样做:

    fn = "fn.txt"
    with open (fn, "r") as myfile:
        data=myfile.read().strip()

..但是,在[1] [2] ..中存储值之前,我有行号data后跟两个我想要删除的空格。我想知道如何在python中做到这一点。

3 个答案:

答案 0 :(得分:2)

您只需要在第一次出现的空白时进行拆分,然后取出每行的其余部分。所以,使用你的' fn'文件:

In [69]: with open('fn') as infile:
    data = [line.strip().split(None,1)[1] for line in infile]
   ....:     

In [70]: data
Out[70]: 
["It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, co",
 "Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover .",
 'Limit of Liability and Disclaimer of Warranty: e authors have used their best e orts in preparing this book, and the information provided herein as is. e information provided is sold without warranty, either express or implied.',
 'Neither the authors nor Cartwheel Web will be held liable for any damages to be caused either directly or indirectly by the contents of this book.',
 'Trademarks: Rather than indicating every occurence of a trademarked name as such, this book uses the names only in an editorial fashion and to the bene t of the trademark owner with no intention of infringement of the trademark.']

答案 1 :(得分:1)

由于行号可能具有不同的长度,但可能也不包含您可以依赖的将“行号”与文本分开的“两个空格”模式,最简单的方法是将字符串分开图案:

number, spaces, line = line.partition('  ')

答案 2 :(得分:0)

您只需要找到第一个出现的两个空格

>>> new_data = ""
>>> with open (p, "r") as myfile:
...    for i in myfile.readlines():
...        new_data += i[i.find("  ")+2:]