我有一些.xy文件(带有x和y值的2列)。我一直试图读取所有这些并将“y”值粘贴到一个excel文件中(“x”值在所有这些文件中都是相同的)。我到现在为止的代码逐个读取文件,但速度非常慢(每个文件大约需要20秒)。我有很多.xy文件,时间大大增加。我现在的代码是:
import os,fnmatch,linecache,csv
from openpyxl import Workbook
wb = Workbook()
ws = wb.worksheets[0]
ws.title = "Sheet1"
def batch_processing(file_name):
row_count = sum(1 for row in csv.reader(open(file_name)))
try:
for row in xrange(1,row_count):
data = linecache.getline(file_name, row)
print data.strip().split()[1]
print data
ws.cell("A"+str(row)).value = float(data.strip().split()[0])
ws.cell("B"+str(row)).value = float(data.strip().split()[1])
print file_name
wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")
except IndexError:
pass
workingdir = "C:\Users\Mine\Desktop\P22_PC"
os.chdir(workingdir)
for root, dirnames, filenames in os.walk(workingdir):
for file_name in fnmatch.filter(filenames, "*_Cs.xy"):
batch_processing(file_name)
感谢任何帮助。感谢。
答案 0 :(得分:2)
我认为您的主要问题是您正在写入Excel并保存文件中的每一行,对于目录中的每个文件。我不确定将值实际写入Excel需要多长时间,但只需将save
移出循环并仅在添加所有内容后保存它应该缩短一点时间。另外,这些文件有多大?如果它们很大,那么linecache
可能是一个好主意,但假设它们不是太大,那么你可能没有它。
def batch_processing(file_name):
# Using 'with' is a better way to open files - it ensures they are
# properly closed, etc. when you leave the code block
with open(filename, 'rb') as f:
reader = csv.reader(f)
# row_count = sum(1 for row in csv.reader(open(file_name)))
# ^^^You actually don't need to do this at all (though it is clever :)
# You are using it now to govern the loop, but the more Pythonic way is
# to do it as follows
for line_no, line in enumerate(reader):
# Split the line and create two variables that will hold val1 and val2
val1, val2 = line
print val1, val2 # You can also remove this - printing takes time too
ws.cell("A"+str(line_no+1)).value = float(val1)
ws.cell("B"+str(line_no+1)).value = float(val2)
# Doing this here will save the file after you process an entire file.
# You could save a bit more time and move this to after your walk statement -
# that way, you are only saving once after everything has completed
wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")