我有相同的代码,使用win32com和xlrd编写。 xlrd在不到一秒的时间内完成算法,而win32com需要几分钟。
这是win32com:
def makeDict(ws):
"""makes dict with key as header name,
value as tuple of column begin and column end (inclusive)"""
wsHeaders = {} # key is header name, value is column begin and end inclusive
for cnum in xrange(9, find_last_col(ws)):
if ws.Cells(7, cnum).Value:
wsHeaders[str(ws.Cells(7, cnum).Value)] = (cnum, find_last_col(ws))
for cend in xrange(cnum + 1, find_last_col(ws)): #finds end column
if ws.Cells(7, cend).Value:
wsHeaders[str(ws.Cells(7, cnum).Value)] = (cnum, cend - 1)
break
return wsHeaders
和xlrd
def makeDict(ws):
"""makes dict with key as header name,
value as tuple of column begin and column end (inclusive)"""
wsHeaders = {} # key is header name, value is column begin and end inclusive
for cnum in xrange(8, ws.ncols):
if ws.cell_value(6, cnum):
wsHeaders[str(ws.cell_value(6, cnum))] = (cnum, ws.ncols)
for cend in xrange(cnum + 1, ws.ncols):#finds end column
if ws.cell_value(6, cend):
wsHeaders[str(ws.cell_value(6, cnum))] = (cnum, cend - 1)
break
return wsHeaders
答案 0 :(得分:11)
(0)你问“为什么win32com比xlrd慢得多?” ......这个问题有点像“你有没有停止殴打你的妻子?” ---它基于一个可能不正确的预设; win32com是由一位优秀的程序员用C语言编写的,但xlrd是由普通程序员用纯Python编写的。真正的区别在于win32com必须调用COM,其中涉及进程间通信和由您知道谁编写,而xlrd直接读取Excel文件。此外,场景中还有第四方:你。请继续阅读。
(1)您没有向我们展示您在COM代码中重复使用的find_last_col()
函数的来源。在xlrd代码中,您很乐意一直使用相同的值(ws.ncols)。因此,在COM代码中,您应该调用find_last_col(ws)
ONCE,然后使用返回的结果。 更新请参阅answer to your separate question,了解如何从COM中获取xlrd Sheet.ncols
的等效内容。
(2)访问每个单元格值TWICE正在减慢两个代码。而不是
if ws.cell_value(6, cnum):
wsHeaders[str(ws.cell_value(6, cnum))] = (cnum, ws.ncols)
试
value = ws.cell_value(6, cnum)
if value:
wsHeaders[str(value)] = (cnum, ws.ncols)
注意:每个代码段中都有两种情况。
(3)你的嵌套循环的目的是什么并不明显,但似乎有一些冗余计算,涉及来自COM的冗余提取。如果您想通过示例告诉我们您要实现的目标,我们可以帮助您提高运行速度。至少,从COM中提取值然后在Python中嵌套循环中处理它们应该更快。有多少列?
更新2 与此同时,小精灵们使用直肠镜检查您的代码,并提出以下脚本:
tests= [
"A/B/C/D",
"A//C//",
"A//C//E",
"A///D",
"///D",
]
for test in tests:
print "\nTest:", test
row = test.split("/")
ncols = len(row)
# modelling the OP's code
# (using xlrd-style 0-relative column indexes)
d = {}
for cnum in xrange(ncols):
if row[cnum]:
k = row[cnum]
v = (cnum, ncols) #### BUG; should be ncols - 1 ("inclusive")
print "outer", cnum, k, '=>', v
d[k] = v
for cend in xrange(cnum + 1, ncols):
if row[cend]:
k = row[cnum]
v = (cnum, cend - 1)
print "inner", cnum, cend, k, '=>', v
d[k] = v
break
print d
# modelling a slightly better algorithm
d = {}
prev = None
for cnum in xrange(ncols):
key = row[cnum]
if key:
d[key] = [cnum, cnum]
prev = key
elif prev:
d[prev][1] = cnum
print d
# if tuples are really needed (can't imagine why)
for k in d:
d[k] = tuple(d[k])
print d
输出:
Test: A/B/C/D
outer 0 A => (0, 4)
inner 0 1 A => (0, 0)
outer 1 B => (1, 4)
inner 1 2 B => (1, 1)
outer 2 C => (2, 4)
inner 2 3 C => (2, 2)
outer 3 D => (3, 4)
{'A': (0, 0), 'C': (2, 2), 'B': (1, 1), 'D': (3, 4)}
{'A': [0, 0], 'C': [2, 2], 'B': [1, 1], 'D': [3, 3]}
{'A': (0, 0), 'C': (2, 2), 'B': (1, 1), 'D': (3, 3)}
Test: A//C//
outer 0 A => (0, 5)
inner 0 2 A => (0, 1)
outer 2 C => (2, 5)
{'A': (0, 1), 'C': (2, 5)}
{'A': [0, 1], 'C': [2, 4]}
{'A': (0, 1), 'C': (2, 4)}
Test: A//C//E
outer 0 A => (0, 5)
inner 0 2 A => (0, 1)
outer 2 C => (2, 5)
inner 2 4 C => (2, 3)
outer 4 E => (4, 5)
{'A': (0, 1), 'C': (2, 3), 'E': (4, 5)}
{'A': [0, 1], 'C': [2, 3], 'E': [4, 4]}
{'A': (0, 1), 'C': (2, 3), 'E': (4, 4)}
Test: A///D
outer 0 A => (0, 4)
inner 0 3 A => (0, 2)
outer 3 D => (3, 4)
{'A': (0, 2), 'D': (3, 4)}
{'A': [0, 2], 'D': [3, 3]}
{'A': (0, 2), 'D': (3, 3)}
Test: ///D
outer 3 D => (3, 4)
{'D': (3, 4)}
{'D': [3, 3]}
{'D': (3, 3)}
答案 1 :(得分:1)
COM需要与另一个实际处理请求的进程通信。 xlrd在数据结构本身上进行处理。
答案 2 :(得分:0)
在我昨晚睡觉的时候想到了它,最后还是用了这个。比原版更优越的版本:
def makeDict(ws):
"""makes dict with key as header name,
value as tuple of column begin and column end (inclusive)"""
wsHeaders = {} # key is header name, value is column begin and end inclusive
last_col = find_last_col(ws)
for cnum in xrange(9, last_col):
if ws.Cells(7, cnum).Value:
value = ws.Cells(7, cnum).Value
cstart = cnum
if ws.Cells(7, cnum + 1).Value:
wsHeaders[str(value)] = (cstart, cnum) #cnum is last in range
return wsHeaders