读取两个字符串之间的所有行

时间:2020-03-11 09:47:06

标签: python readlines

我想提取xml中介于xml和xml之间的行。这是一个示例:

<userData code="viPartListRailML" value="1">
            <partRailML s="0.0000000000000000e+00" id="0"/>
            <partRailML s="2.0000000000000000e+01" id="1"/>
            <partRailML s="9.4137883373059267e+01" id="2"/>
        </userData>

这是我正在尝试的代码:

import re

shakes = open("N:\SAJAT_MAPPAK\IGYULAVICS\/adhoc\pythonXMLread\probaxml\github_minta.xml", "r")
for x in shakes:
    if "userData" in x:
        print x
        continue
    if "/userData" in x:
        break

问题是它仍然只返回包含<userData</userData>的行 如何修改它以获得这两个“单词”之间的界线

3 个答案:

答案 0 :(得分:1)

假设文件中有一个<userData>块,则可以通过以下方式提取块中的行:

shakes = open("./file.xml", "r")
inblock = False
for x in shakes:
    if "/userData" in x:
        inblock = False
    if inblock:
        print(x)
    if "userData" in x:
        inblock = True

但是使用xml解析器读取文件更可靠,例如:

import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')

for data in tree.getroot().iter('userData'):
    for child in data:
        print(ET.tostring(child))
        # or something else, eg:
        # print(child.tag)

顺便说一句,请尽可能使用Python3,Python2已停用。

答案 1 :(得分:1)

简便的方法是添加一个变量,该变量告诉您​​是否在这些单词之间:

shakes = open("N:\SAJAT_MAPPAK\IGYULAVICS\/adhoc\pythonXMLread\probaxml\github_minta.xml", "r")
t=False
for x in shakes:
    if t:
        print(x) # also /userdata -line is printed
    if "/userData" in x:
        t=False
    elif "userData" in x: # this matches /userData as well--> elif
        t=True

答案 2 :(得分:0)

您可以使用itertools.dropwhile到达<userData部分,然后使用itertools.takewhile阅读直到</userData

import itertools as it

result = it.takewhile(
    lambda x: '</userData' not in x,
    it.dropwhile(
        lambda x: '<userData' not in x,
        text.splitlines()
    )
)
print('\n'.join(result))

如果要跳过<userData元素,可以添加itertools.islice

result = it.takewhile(
    lambda x: '</userData' not in x,
    it.islice(it.dropwhile(
        lambda x: '<userData' not in x,
        text.splitlines()
    ), 1, None)
)
print('\n'.join(result))