我得到了一些数据,需要创建一个排序映射。实际排序由C代码完成,它从我的代码中获取整数列表flt_neworder
。这是我目前的解决方案:
# Demo data
data = [
"Option A", # 0
"Option B", # 1
"Blabla", # 2
"Some text" # 3
]
class Item:
def __init__(self, label):
self.label = label
col = [Item(d) for d in data]
# Create sorting mapping
flt_neworder = [
x[1] for x in sorted(
zip(
[x[0] for x in sorted(enumerate(col), key=lambda x: x[1].label)],
range(len(col))
)
)
]
# Output: [1,2,0,3]
print(flt_neworder)
所需的输出:[1,2,0,3]
,不 [2,0,1,3]
!
位置flt_neworder
= col
整数值 =新职位
哪些更有效,或者至少是更易读的解决方案?
我成功测试了这个单行:
tuple({k: i for i, (k, v) in enumerate(sorted(enumerate(data), key=operator.itemgetter(1)))}.values())
但它仍然难以阅读,我相信我正在利用dict
在CPython实现中排序的事实...
修改
我提出的另一个解决方案:
flt_neworder = [None] * len(col)
for j, (_, i) in enumerate(sorted(zip((item.label for item in col), range(len(col))))): flt_neworder[i] = j
另一个,但速度很慢:
flt_neworder = list(map(get(0), sorted(enumerate(sorted(enumerate(item.label for item in col), key=get(1))), key=get(1))))
感谢Ryan P 提供替代解决方案和测试时间的脚本!
我在一个大型数据集(1k独特字符串,full script)上测试了解决方案,与Ryan的小集相比,时间上有惊人的差异:
orig: 2.116799074272876
origmod: 2.118176033553482
orignew: 1.1691872433702883
orig3: 1.4400411206224817
orig4: 2.0643228139915664
rewrite: 26.06907118537356
rewriteop: 25.91357442379376
rewriteuniq: 10.783081019086694
获胜者为orignew()
,rewriteuniq()
对于小型数据集来说速度很快,但对于大型数据集来说并不是很好。
答案 0 :(得分:3)
这比原始代码更快,更容易阅读IMO:
data = [
'Option A',
'Option B',
'Blabla',
'Some text'
]
idata = list(enumerate(data)) # add indexes to uniquely identify items
sdata = sorted(idata, key=lambda x: x[1]) # sort the items by label
flt_neworder = [sdata.index(x) for x in idata] # find the position to move to
timeit
的结果:
orig: 12.3757910728
origmod: 7.85222291946
orignew: 6.15745902061
rewrite: 6.31552696228
(origmod就像你的原始代码一样,但没有Item
类,因为它似乎并不像你使用它; orignew是你的单行代码
你的单行程略快,但我觉得更难阅读。
好的,这次我将包含我的完整测试代码。我将Item
创建文件移出orig
,因为您只创建模仿现实世界数据的文件。除了orig3
(您的新代码)和rewriteop
(rewrite
和operator.itemgetter
)之外,我还添加了额外的测试rewriteuniq
将是独一无二的。
结果:
orig: 7.641715765
origmod: 7.38071417809
orignew: 5.82565498352
orig3: 5.67061495781
rewrite: 5.95284795761
rewriteop: 5.61896586418
rewriteuniq: 1.90719294548
代码:
import operator
from timeit import timeit
data = [
'Option A',
'Option B',
'Blabla',
'Some text',
]
desired_output = [1, 2, 0, 3]
class Item:
def __init__(self, label):
self.label = label
col = [Item(d) for d in data]
def orig():
flt_neworder = [
x[1] for x in sorted(
zip(
[x[0] for x in sorted(enumerate(col), key=lambda x: x[1].label)],
range(len(col))
)
)
]
assert flt_neworder == desired_output
def origmod():
flt_neworder = [
x[1] for x in sorted(
zip(
[x[0] for x in sorted(enumerate(data), key=lambda x: x[1])],
range(len(data))
)
)
]
assert flt_neworder == desired_output
def orignew():
flt_neworder = list({k: i for i, (k, v) in enumerate(sorted(enumerate(data), key=operator.itemgetter(1)))}.values())
assert flt_neworder == desired_output
def orig3():
flt_neworder = [None] * len(col)
for j, (_, i) in enumerate(sorted(zip((item.label for item in col), range(len(col))))): flt_neworder[i] = j
assert flt_neworder == desired_output
def rewrite():
idata = list(enumerate(data))
sdata = sorted(idata, key=lambda x: x[1])
flt_neworder = [sdata.index(x) for x in idata]
assert flt_neworder == desired_output
def rewriteop():
idata = list(enumerate(data))
sdata = sorted(idata, key=operator.itemgetter(1))
flt_neworder = [sdata.index(x) for x in idata]
assert flt_neworder == desired_output
def rewriteuniq():
sdata = sorted(data)
flt_neworder = [sdata.index(x) for x in data]
assert flt_neworder == desired_output
for fn in (orig, origmod, orignew, orig3, rewrite, rewriteop, rewriteuniq):
print fn.__name__ + ':', timeit(fn)