Question

我想按照这里给出的答案：

How to only read lines in a text file after a certain string using python?

仅读取我在布尔路线或第二个答案之后的某个短语之后的行。

我需要从文件

中获取两个开始和结束部分之间的数字

<type>
1 
2
3
<type>

但是当我使用这段代码时：

found_type = False
t_ype = [] 
with open('test.xml', 'r') as f:
    for line in f:
        if '<type>' in line:
            found_type = True
        if found_type:
            if '</type>' in line:
               found_type = False               
            else:    
                t_line = str(line).rstrip('\n')
                t_ype.append(t_line)

我无法跳过第一行并获得：

'<type>', '1','2','3'

我想要的地方

'1','2','3'

虽然在我点击时结束了附加到列表，因为我不需要在我的列表中

我不确定我做错了什么并且无法在页面上提问，因为我的代表不够高。

Answer 1

在检测到＆＃34;标题＆＃34;后，您必须跳过for循环的其余部分。在您的代码中，您将found_type设置为True，然后if found_type:检查匹配。

found_type = False
t_ype = [] 
with open('test.xml', 'r') as f:
    for line in f:
        if '<type>' in line:
            found_type = True
            continue                    # This is the only change to your code.
                                        # When the header is found, immediately go to the next line
        if found_type:
            if '</type>' in line:
               found_type = False               
            else:    
                t_line = str(line).rstrip('\n')
                t_ype.append(t_line)

Answer 2

最简单的方法是带有yield的双循环：

def section(fle, begin, end):
    with open(fle) as f:
        for line in f:
            # found start of section so start iterating from next line
            if line.startswith(begin):
                for line in f: 
                    # found end so end function
                    if line.startswith(end):
                        return
                    # yield every line in the section
                    yield line.rstrip()

然后只需调用list(section('test.xml','<type>','</type>'))或迭代for line in section('test.xml','<type>','</type>'):use lines，如果您有重复的部分，则交换返回中断。你也不需要在行上调用str，因为它们已经是字符串，如果你有一个大文件，那么注释中的groupby方法可能是更好的选择。

在python中读取某个部分的文件

2 个答案: