我对python和编码很新,所以对于任何愚蠢的问题都要提前抱歉。我的程序需要根据关键字“MYLOG”将现有日志文件拆分为几个* .csv文件(run1,.csv,run2.csv,...)。如果关键字出现,它应该开始将两个所需的列复制到新文件中,直到关键字再次出现。完成后,需要有与关键字一样多的csv文件。
53.2436 EXP MYLOG: START RUN specs/run03_block_order.csv
53.2589 EXP TextStim: autoDraw = None
53.2589 EXP TextStim: autoDraw = None
55.2257 DATA Keypress: t
57.2412 DATA Keypress: t
59.2406 DATA Keypress: t
61.2400 DATA Keypress: t
63.2393 DATA Keypress: t
...
89.2314 EXP MYLOG: START BLOCK scene [specs/run03_block01.csv]
89.2336 EXP Imported specs/run03_block01.csv as conditions
89.2339 EXP Created sequence: sequential, trialTypes=9
...
[编辑]:每个文件的输出(运行* .csv)应如下所示:
onset type
53.2436 EXP
53.2589 EXP
53.2589 EXP
55.2257 DATA
57.2412 DATA
59.2406 DATA
61.2400 DATA
...
该程序根据需要创建尽可能多的运行* .csv,但我无法在新文件中存储所需的列。完成后,我得到的只是空的csv文件。如果我将计数器变量移到== 1,它只会创建一个包含所需列的大文件。
再次感谢!
import csv
QUERY = 'MYLOG'
with open('localizer.log', 'rt') as log_input:
i = 0
for line in log_input:
if QUERY in line:
i = i + 1
with open('run' + str(i) + '.csv', 'w') as output:
reader = csv.reader(log_input, delimiter = ' ')
writer = csv.writer(output)
content_column_A = [0]
content_column_B = [1]
for row in reader:
content_A = list(row[j] for j in content_column_A)
content_B = list(row[k] for k in content_column_B)
writer.writerow(content_A)
writer.writerow(content_B)
答案 0 :(得分:1)
查看代码有一些可能错误的事情:
您可能正在查看以下代码(正在等待问题澄清):
import csv
NEW_LOG_DELIMITER = 'MYLOG'
def write_buffer(_index, buffer):
"""
This function takes an index and a buffer.
The buffer is just an iterable of iterables (ex a list of lists)
Each buffer item is a row of values.
"""
filename = 'run{}.csv'.format(_index)
with open(filename, 'w') as output:
writer = csv.writer(output)
writer.writerow(['onset', 'type']) # adding the heading
writer.writerows(buffer)
current_buffer = []
_index = 1
with open('localizer.log', 'rt') as log_input:
for line in log_input:
# will deal ok with multi-space as long as
# you don't care about the last column
fields = line.split()[:2]
if not NEW_LOG_DELIMITER in line or not current_buffer:
# If it's the first line (the current_buffer is empty)
# or the line does NOT contain "MYLOG" then
# collect it until it's time to write it to file.
current_buffer.append(fields)
else:
write_buffer(_index, current_buffer)
_index += 1
current_buffer = [fields] # EDIT: fixed bug, new buffer should not be empty
if current_buffer:
# We are now out of the loop,
# if there's an unwritten buffer then write it to file.
write_buffer(_index, current_buffer)
答案 1 :(得分:0)
您可以使用pandas来简化此问题。
导入pandas并读入日志文件。
import pandas as pd
df = pd.read_fwf('localizer2.log', header=None)
df.columns = ['onset', 'type', 'event']
df.set_index('onset', inplace=True)
设置标记,其中第三列==' MYLOG'
df['flag'] = 0
df.loc[df.event.str[:5] == 'MYLOG', 'flag'] = 1
df.flag = df['flag'].cumsum()
将每次运行保存为单独的运行* .csv文件
for i in range(1, df.flag.max()+1):
df.loc[df.flag == i, 'event'].to_csv('run{0}.csv'.format(i))
修改强>
看起来你的格式与我原先假设的不同。更改为使用pd.read_fwf。我的localizer.log文件是您原始数据的复制和粘贴,希望这对您有用。我在原帖中假设它没有标题。如果它有标题,请删除header=None
和df.columns = ['onset', 'type', 'event']
。