Question

我有一个日志文件，我使用正则表达式解析。它返回3个元素

1）时间戳

2）numberid

3）objectvalue

我打算在CSV文件中有效地写这个（因为日志文件的大小可能很大）。
我试过这个

def read_logs(input_file):
    data = defaultdict()
    for each in input_file:
        regex_match = re(r'',each)
        data['timestamp'].append(regex_match.group(1))
        data['numberid'].append(regex_match.group(2))
        data['objectvalue'].append(regex_match.group(3))
    return data

def main(inputname,outputname):
    with open(inputname) as input_file:
        data = read_logs(input_file)
    with open(outputname,'w') as out_file:
        write_file(out_file,data)

def write_file(out_file):
    out = csv.writer(out_file)
    out.writerow(['timestamp_val','numberid','objectvalue'])

1）我认为使用defaultdict是将这些数据写入文件的最有效方式。这里的defaultdict密钥为timestamp numberid和obejctvalue，其中list为其值。如何在CSV文件中写这个？

样本数据值为
data = ('timestamp_val':['10:10:54','13:02:07','03:02:10'],'numberid':[AA10,BB18,FF34],'objectvalue':['NHAG','ABCD','YTAB'])

2）如果这不是一种有效的方法，那么可能有什么更好的方法来实现这一目标？

其他方式，我想到的是使用正则表达式读取每一行并在CSV文件中同时写入。这是一个好方法吗？

Answer 1

我认为您不需要阅读列表中的所有文件：尽快阅读

def main(inputname,outputname):
    with open(inputname) as input_file, open(outputname,'w') as out_file:
        out = csv.writer(out_file)
        out.writerow(['timestamp_val','numberid','objectvalue'])
        for each in input_file:
            regex_match = re(r'',each)
            out.writerow([regex_match.group(1), regex_match.group(2), regex_match.group(3)])

解析日志文件并将其有效写入csv文件

1 个答案: