优化或加速从.xy文件读取到excel

时间:2012-11-26 03:47:33

标签: python excel file batch-file

我有一些.xy文件(带有x和y值的2列)。我一直试图读取所有这些并将“y”值粘贴到一个excel文件中(“x”值在所有这些文件中都是相同的)。我到现在为止的代码逐个读取文件,但速度非常慢(每个文件大约需要20秒)。我有很多.xy文件,时间大大增加。我现在的代码是:

import os,fnmatch,linecache,csv
from openpyxl import Workbook

wb = Workbook() 
ws = wb.worksheets[0]
ws.title = "Sheet1"


def batch_processing(file_name):
    row_count = sum(1 for row in csv.reader(open(file_name)))
    try:
        for row in xrange(1,row_count):

            data = linecache.getline(file_name, row)
            print data.strip().split()[1]   
            print data
            ws.cell("A"+str(row)).value = float(data.strip().split()[0])
            ws.cell("B"+str(row)).value = float(data.strip().split()[1])

        print file_name
        wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")
    except IndexError:
        pass


workingdir = "C:\Users\Mine\Desktop\P22_PC"
os.chdir(workingdir)
for root, dirnames, filenames in os.walk(workingdir):
    for file_name in fnmatch.filter(filenames, "*_Cs.xy"):
        batch_processing(file_name)

感谢任何帮助。感谢。

1 个答案:

答案 0 :(得分:2)

我认为您的主要问题是您正在写入Excel并保存文件中的每一行,对于目录中的每个文件。我不确定将值实际写入Excel需要多长时间,但只需将save移出循环并仅在添加所有内容后保存它应该缩短一点时间。另外,这些文件有多大?如果它们很大,那么linecache可能是一个好主意,但假设它们不是太大,那么你可能没有它。

def batch_processing(file_name):

    # Using 'with' is a better way to open files - it ensures they are
    # properly closed, etc. when you leave the code block
    with open(filename, 'rb') as f:
        reader = csv.reader(f)
        # row_count = sum(1 for row in csv.reader(open(file_name)))
        # ^^^You actually don't need to do this at all (though it is clever :)
        # You are using it now to govern the loop, but the more Pythonic way is
        # to do it as follows
        for line_no, line in enumerate(reader):
            # Split the line and create two variables that will hold val1 and val2
            val1, val2 = line
            print val1, val2 # You can also remove this - printing takes time too
            ws.cell("A"+str(line_no+1)).value = float(val1)
            ws.cell("B"+str(line_no+1)).value = float(val2)

    # Doing this here will save the file after you process an entire file.
    # You could save a bit more time and move this to after your walk statement - 
    # that way, you are only saving once after everything has completed
    wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")