Question

所以我有几个.txt文件，每个文件超过500.000行。在所有这些中，我有一段我想要提取到他们自己的.txt文件中。

为此，我使用以下代码：

for i, structure in enumerate(structures):
    with open("data.txt", 'r') as f:
        structure_data = open('data_new.txt'), 'w')
        copy = False
        for line in f:
            if line.strip() == "Structure: {}".format(structures[i]):
                structure_data.write(line)
                copy = True
            elif line.strip() == "Structure: {}".format(structures[i+1]):
                copy = False
            elif copy:
                structure_data.write(line)
    structure_data.close()
f.close()

这里structures列出了我所拥有的结构。

所以基本上在每个.txt文件中都有一行显示Structure: <some structure in the structures list>。我现在希望在数据文件中提取两个字符串structures[i]和structures[i+1]之间的数据。在我上面的示例中，它执行此操作，并且我获得了包含我想要的数据的新.txt文件，但是，当我到达.txt文件时，我收到以下错误：

    elif line.strip() == "Structure: {}".format(structures[i+1]):
IndexError: list index out of range

据我所知，其原因在于，对于.txt文件的最后一部分，没有＆＃34; end＆＃34; Structure: <structure>因此无法设置copy = False。

因此我确实得到了我想要的.txt文件输出，但是如你所知，没有什么比那些有错误的代码更糟糕的了。那么有没有办法告诉它，如果没有这样的＆＃34;结束线＆＃34;，那么eveything可以吗？

更新：这就是data.txt中的数据可能有点像：

Structure: TR

Dose [cGy] Ratio of Total Structure Volume [%]
         0                       100
  0.100619                       100
  0.201238                       100
  0.301857                       100
  0.402476                       100
  0.503096                       100
  0.603715                       100
  0.704334                       100
  0.804953                       100
  0.905572                       100

Structure: SV


Dose [cGy] Ratio of Total Structure Volume [%]
         0                       100
  0.100619                       100
  0.201238                       100
  0.301857                       100
  0.402476                       100
  0.503096                       100
  0.603715                       100
  0.704334                       100
  0.804953                       100
  0.905572                       100


Structure: DY

Dose [cGy] Ratio of Total Structure Volume [%]
         0                       100
  0.100619                   88.2441
  0.201238                   76.4882
  0.301857                   64.7324
  0.402476                   52.9765
  0.503096                   41.2206
  0.603715                   29.4647
  0.704334                    17.707
  0.804953                   17.6784
  0.905572                   17.6499

因此，在structures列表中，我已经拥有TR，SV和DY的结构。

因此，在for line in f循环中，我想在Structures: structures[i]行和Structures: structures[i+1]之间获取文本/数据并将其保存到文本文件中，然后执行structures循环直到Structures: structures[i+1] 1}}列表已循环通过。但正如所述，当我到达最后一部分时，没有结束define('WP_HOME','https://example.com'); define('WP_SITEURL','https://example.com');，因此我收到错误。这个错误是我想要避免的。

Answer 1

一个简单的解决方案是简单地将structure添加到structures的末尾，而不会在文件中的任何位置出现。然后你可以这样写你的循环：

for structure1, structure2 in zip(structures[:-1], structures[1:]):

这将遍历所有连续结构对。

另一种解决方案（避免使用虚拟结构）将取代

elif line.strip() == "Structure: {}".format(structures[i+1]):

与

elif i+1 != len(structures) and line.strip() == "Structure: {}".format(structures[i+1]):

如果第一部分为假，则不会评估条件的第二部分（导致错误）。如果您决定使用此版本，则可能需要替换

for i, structure in enumerate(structures):

与

for i in range(len(structures)):

因为您实际上并未在任何地方使用变量structure。

在文本文件中的某些字符串之间写入数据（最后一个元素错误）

1 个答案: