Question

我有一个XML文件，看起来像这样，example 该文件包含5000个配置文件（数据集），每个配置文件包含92行和5列，每个配置文件由2行（我想跳过）分隔。我想提取一些选定的配置文件并写入另一个文件。我已经做了以下程序来执行此操作。但是使用此代码，我只能提取有限的配置文件。

    with open('file.xml') as f:
      for j in lat :
        l=94*j
        i=l-92
        g.write('%s' % j)
        g.write(":-profile")
        g.write("\n")
        for lines in itertools.islice(f, i, l): 
          g.write('%s' % lines)
        g.write("</Matrix>")
        g.write("\n")
        g.write('<Matrix nrows="92" ncols="5">')
        g.write("\n")

当我打印'j'时，它占据'lat'的所有值（我选择的配置文件）。在我的输出文件中，我只得到几个配置文件的值，然后它只显示最后一行

        g.write("</Matrix>")
        g.write("\n")
        g.write('<Matrix nrows="92" ncols="5">')
        g.write("\n")

我知道这很傻，但我是python编程的初学者..请帮助

我尝试将'j'和'lines'打印在一起，经过一定的迭代后，输出只显示了j的值，没有输出行

Answer 1

import re

nums_profiles = set()
with open("lat_sel.dat", "r") as num_profiles_file:
    for line in num_profiles_file.readlines():
        for i in line.split():
            nums_profiles.add(int(i))

with open('extracted_output.xml', 'w') as output_file, open('chevallierl91_clear_q.xml', "r") as matrix_file:
    profile_counter = 0

    for line in matrix_file.readlines():

        # save the ending xml tags
        for end_tag in ['</Array>', '</arts>']:
            if end_tag in line:
                output_file.write(line)

        # counting profiles
        if 'Matrix nrows' in line:
            profile_counter += 1

        # save header of xml file
        if profile_counter == 0:
            if '<Array type="Matrix" nelem=' in line:
                line = re.sub('nelem="[0-9]+"', 'nelem="%s"', line) % len(nums_profiles)

            output_file.write(line)

        # check if profile is the one which we need. If so, save data
        if profile_counter in nums_profiles:
            output_file.write(line)

Python：从输入文件（xml文件）中提取数据时出错，循环在一些迭代后停止

1 个答案: