Question

像这样的大数据文件：

133621    652.4   496.7  1993.0 ...
END       SAMPLES EVENTS  RES  271.0     2215.0 ...
ESACC     935.6   270.6  2215.0 ...
115133    936.7   270.3  2216.0 ...
115137    936.4   270.4  2219.0 ...
115141    936.1   271.0  2220.0 ...
ESACC L   114837    115141  308   938.5   273.3    2200
115145    936.3   271.8  2220.0 ...
END 115146  SAMPLES EVENTS  RES   44.11   44.09
SFIX L   133477
133477    650.8   500.0  2013.0 ...
133481    650.2   499.9  2012.0 ...
ESACC     650.0   500.0  2009.0 ...

想要仅将ESACC数据吸收到试验中。当END出现时，先前的ESACC数据会聚合到试验中。现在，我可以将第一块ESACC数据放入一个文件中，但因为循环从数据的开头重新开始，所以它一直只抓取第一个块，所以我有80个试验具有完全相同的数据。

for i in range(num_trials):
   with open(fid) as testFile:
       for tline in testFile:

           if 'END' in tline:
               fid_temp_start.close()
               fid_temp_end.close()   #Close the files
               break

           elif 'ESACC' in tline:

               tline_snap = tline.split()
               sac_x_start = tline_snap[4]
               sac_y_start = tline_snap[5

               sac_x_end = tline_snap[7]
               sac_y_end = tline_snap[8]

我的问题：如何在不抓取以前的块的情况下迭代到下一个数据块？

Answer 1

尝试像这样重写代码：

def data_parse(filepath): #Make it a function
    try:
        with open(filepath) as testFile:
            tline = '' #Initialize tline
            while True: #Switch to an infinite while loop (I'll explain why)
                while 'ESACC' not in tline: #Skip lines until one containing 'ESACC' is found
                    tline = next(testFile)  #(since it seems like you're doing that anyway)

                tline_snap = tline.split()
                trial = [tline_snap[4],'','',''] #Initialize list and assign first value
                trial[1] = tline_snap[5]

                trial[2] = tline_snap[7]
                trial[3] = tline_snap[8]

                while 'END' not in tline:  #Again, seems like you're skipping lines
                    tline = next(testFile) #so I'll do the same

                yield trial #Output list, save function state

    except StopIteration:
        fid_temp_start.close() #I don't know where these enter the picture
        fid_temp_end.close()   #but you closed them so I will too
        testfile.close()

#Now, initialize a new list and call the function:
trials = list()
for trial in data_parse(fid);
    trials.append(trial) #Creates a list of lists

这将创建一个生成器函数。通过使用yield而不是return，该函数返回一个值并保存其状态。下次调用该函数时（就像您将在最后的for循环中重复进行的那样），该函数将从上次中断的地方开始。它从最近执行的yield语句（在这种情况下重新启动while循环）之后的行开始，并且重要的是，它记住任何变量的值（例如tline和point的值）停止在数据文件中）。

当您到达文件末尾（并因此记录了所有试验）时，tline = next(testFile)的下一次执行将引发StopIteration错误。 try - except结构会捕获该错误，并使用它退出while循环并关闭文件。这就是为什么我们使用无限循环的原因。我们要继续循环直到该错误迫使我们退出。

在整个过程的最后，您的数据作为列表列表存储在trials中，其中每个项目都等于[sac_x_start, sac_y_start, sac_x_end, sac_y_end]（如您在代码中定义的那样），用于一次试用。 / p>

注意：在我看来，当您的代码不包含ESACC或END时，它们似乎完全跳过了行。我已经复制了，但是我不确定那是否是您想要的。如果您想在两者之间找到界限，则可以通过添加到'END'循环中来相当简单地重写此代码，如下所示：

while 'END' not in tline:
    tline = next(testFile)
    #(put assignment operations to be applied to each line here)

当然，您必须调整用于存储此数据的变量。

编辑：亲爱的上帝，我刚才注意到了这个问题的年龄。

将列表拆分为文件而不重复

1 个答案: