如何使用python将日志文件拆分为多个csv文件

时间:2016-11-21 12:39:13

标签: python csv

我对python和编码很新,所以对于任何愚蠢的问题都要提前抱歉。我的程序需要根据关键字“MYLOG”将现有日志文件拆分为几个* .csv文件(run1,.csv,run2.csv,...)。如果关键字出现,它应该开始将两个所需的列复制到新文件中,直到关键字再次出现。完成后,需要有与关键字一样多的csv文件。

53.2436     EXP     MYLOG: START RUN specs/run03_block_order.csv
53.2589     EXP     TextStim: autoDraw = None
53.2589     EXP     TextStim: autoDraw = None
55.2257     DATA    Keypress: t
57.2412     DATA    Keypress: t
59.2406     DATA    Keypress: t
61.2400     DATA    Keypress: t
63.2393     DATA    Keypress: t
...
89.2314     EXP     MYLOG: START BLOCK scene [specs/run03_block01.csv]
89.2336     EXP     Imported specs/run03_block01.csv as conditions
89.2339     EXP     Created sequence: sequential, trialTypes=9
...

[编辑]:每个文件的输出(运行* .csv)应如下所示:

onset       type
53.2436     EXP     
53.2589     EXP     
53.2589     EXP     
55.2257     DATA    
57.2412     DATA    
59.2406     DATA    
61.2400     DATA    
...

该程序根据需要创建尽可能多的运行* .csv,但我无法在新文件中存储所需的列。完成后,我得到的只是空的csv文件。如果我将计数器变量移到== 1,它只会创建一个包含所需列的大文件。

再次感谢!

import csv

QUERY = 'MYLOG'

with open('localizer.log', 'rt') as log_input:
i = 0

for line in log_input:

    if QUERY in line:
        i = i + 1

        with open('run' + str(i) + '.csv', 'w') as output:
            reader = csv.reader(log_input, delimiter = ' ')
            writer = csv.writer(output)
            content_column_A = [0]
            content_column_B = [1]

            for row in reader:
                content_A = list(row[j] for j in content_column_A)
                content_B = list(row[k] for k in content_column_B)
                writer.writerow(content_A)
                writer.writerow(content_B)

2 个答案:

答案 0 :(得分:1)

查看代码有一些可能错误的事情:

  1. csv阅读器应该采用文件处理程序,而不是单行。
  2. 读者分隔符不应该是单个空格字符,因为它看起来像是日志中的实际分隔符是可变数量的多个空格字符。
  3. 循环逻辑似乎有点过时了,稍微混淆了文件/行/行。
  4. 您可能正在查看以下代码(正在等待问题澄清):

    import csv
    NEW_LOG_DELIMITER = 'MYLOG'
    
    def write_buffer(_index, buffer):
        """
        This function takes an index and a buffer.
        The buffer is just an iterable of iterables (ex a list of lists)
        Each buffer item is a row of values.
        """
        filename = 'run{}.csv'.format(_index)
        with open(filename, 'w') as output:
            writer = csv.writer(output)
            writer.writerow(['onset', 'type'])  # adding the heading
            writer.writerows(buffer)
    
    current_buffer = []
    _index = 1
    
    with open('localizer.log', 'rt') as log_input:
        for line in log_input:
            # will deal ok with multi-space as long as
            # you don't care about the last column
            fields = line.split()[:2]
            if not NEW_LOG_DELIMITER in line or not current_buffer:
                # If it's the first line (the current_buffer is empty)
                # or the line does NOT contain "MYLOG" then
                # collect it until it's time to write it to file.
                current_buffer.append(fields)
            else:
                write_buffer(_index, current_buffer)
                _index += 1
                current_buffer = [fields]  # EDIT: fixed bug, new buffer should not be empty
        if current_buffer:
            # We are now out of the loop,
            # if there's an unwritten buffer then write it to file.
            write_buffer(_index, current_buffer)
    

答案 1 :(得分:0)

您可以使用pandas来简化此问题。

导入pandas并读入日志文件。

import pandas as pd

df = pd.read_fwf('localizer2.log', header=None)
df.columns = ['onset', 'type', 'event']
df.set_index('onset', inplace=True)

设置标记,其中第三列==' MYLOG'

df['flag'] = 0
df.loc[df.event.str[:5] == 'MYLOG', 'flag'] = 1
df.flag = df['flag'].cumsum()

将每次运行保存为单独的运行* .csv文件

for i in range(1, df.flag.max()+1):
    df.loc[df.flag == i, 'event'].to_csv('run{0}.csv'.format(i))

修改 看起来你的格式与我原先假设的不同。更改为使用pd.read_fwf。我的localizer.log文件是您原始数据的复制和粘贴,希望这对您有用。我在原帖中假设它没有标题。如果它有标题,请删除header=Nonedf.columns = ['onset', 'type', 'event']