Question

我想在我的文本文件"Aberdeen2005.txt"中创建一个文本文件，其中包含32篇文章中每篇文章的第5行。我已经使用以下方式分离了我的文件：

import re 
sections = [] 
current = []
with open("Aberdeen2005.txt") as f:
    for line in f:
        if re.search(r"(?i)\d+ of \d+ DOCUMENTS", line):        
           sections.append("".join(current))
           current = [line]
        else:
           current.append(line)

print(len(sections))

为了做到这一点，我正在尝试以下代码：

for i in range(1,500):
    print(sections[i].readline(5))

但它不起作用。有什么想法吗？

亲切的问候！

Answer 1

不确定我是否完全得到了你想要的东西是这样的吗？

for a in sections:
    for i, line in enumerate(a):
        if i==4:
            #5th line
            print line

Answer 2

首先执行range(1,500)这可能会超出提升IndexError的部分范围，使用range(len(sections))会更安全，因此它总是正确的尺寸。< / p>

将current作为列表保留可能更有用，因为它已经被行拆分了：

sections.append(current)

然后只需将.readline(5)更改为[4]以从列表中获取第4个元素（因为索引从0开始，因此idx 4是第5行）所以它看起来像这样：

import re 
sections = [] 
current = []
with open("Aberdeen2005.txt") as f:
    for line in f:
        if re.search(r"(?i)\d+ of \d+ DOCUMENTS", line):        
           sections.append(current) #remove the "".join() to keep it split up by line
           current = [line]
        else:
           current.append(line)

print(len(sections))

for i in range(len(sections)): #range(len(...))
    print(sections[i][4])  #changed .readline(5) to [4] since .readline() only works on files

您遇到问题的原因是因为.readline()是文件对象上的一种方法，当它被处理到列表时，它是一个字符串AttributeError，因为str我没有.readline方法，而是可以用以下行分割它：

sections[i].split("\n")[4]

＆＃34; \ n＆＃34;是换行符，它可能不会出现在每行的末尾，具体取决于操作系统或其他操作（例如，如果你.strip()每行），但这些部分只包含可能更符合您喜欢的字符串：

import re 
sections = [] 
current = []
with open("Aberdeen2005.txt") as f:
    for line in f:
        if re.search(r"(?i)\d+ of \d+ DOCUMENTS", line):        
           sections.append("".join(current))
           current = [line]
        else:
           current.append(line)

print(len(sections))

for i in range(len(sections)): #range(len(...))
    print(sections[i].split("\n")[4])  #changed .readline(5) to .split("\n")[4]

用每篇文章的第5行创建的文本文件

2 个答案: