订单映射 - 更高效的Python解决方案?

时间:2014-04-23 22:23:07

标签: python sorting mapping

我得到了一些数据,需要创建一个排序映射。实际排序由C代码完成,它从我的代码中获取整数列表flt_neworder。这是我目前的解决方案:

# Demo data
data = [
    "Option A",  # 0
    "Option B",  # 1
    "Blabla",    # 2
    "Some text"  # 3
]

class Item:
    def __init__(self, label):
        self.label = label

col = [Item(d) for d in data]

# Create sorting mapping
flt_neworder = [
   x[1] for x in sorted(
       zip(
           [x[0] for x in sorted(enumerate(col), key=lambda x: x[1].label)],
           range(len(col))
       )
   )
]

# Output: [1,2,0,3]
print(flt_neworder)
  • 所需的输出:[1,2,0,3] [2,0,1,3]

  • 位置flt_neworder = col

  • 中项目的原始索引
  • 整数 =新职位

哪些更有效,或者至少是更易读的解决方案?

我成功测试了这个单行:

tuple({k: i for i, (k, v) in enumerate(sorted(enumerate(data), key=operator.itemgetter(1)))}.values())

但它仍然难以阅读,我相信我正在利用dict在CPython实现中排序的事实...

修改

我提出的另一个解决方案:

flt_neworder = [None] * len(col)
for j, (_, i) in enumerate(sorted(zip((item.label for item in col), range(len(col))))): flt_neworder[i] = j

另一个,但速度很慢:

flt_neworder = list(map(get(0), sorted(enumerate(sorted(enumerate(item.label for item in col), key=get(1))), key=get(1))))


感谢Ryan P 提供替代解决方案和测试时间的脚本!

我在一个大型数据集(1k独特字符串,full script)上测试了解决方案,与Ryan的小集相比,时间上有惊人的差异:

orig: 2.116799074272876
origmod: 2.118176033553482
orignew: 1.1691872433702883
orig3: 1.4400411206224817
orig4: 2.0643228139915664
rewrite: 26.06907118537356
rewriteop: 25.91357442379376
rewriteuniq: 10.783081019086694

获胜者为orignew()rewriteuniq()对于小型数据集来说速度很快,但对于大型数据集来说并不是很好。

1 个答案:

答案 0 :(得分:3)

这比原始代码更快,更容易阅读IMO:

data = [
    'Option A',
    'Option B',
    'Blabla',
    'Some text'
]
idata = list(enumerate(data))  # add indexes to uniquely identify items
sdata = sorted(idata, key=lambda x: x[1])  # sort the items by label
flt_neworder = [sdata.index(x) for x in idata]  # find the position to move to

timeit的结果:

orig: 12.3757910728
origmod: 7.85222291946
orignew: 6.15745902061
rewrite: 6.31552696228

(origmod就像你的原始代码一样,但没有Item类,因为它似乎并不像你使用它; orignew是你的单行代码

你的单行程略快,但我觉得更难阅读。


好的,这次我将包含我的完整测试代码。我将Item创建文件移出orig,因为您只创建模仿现实世界数据的文件。除了orig3(您的新代码)和rewriteoprewriteoperator.itemgetter)之外,我还添加了额外的测试rewriteuniq将是独一无二的。

结果:

orig: 7.641715765
origmod: 7.38071417809
orignew: 5.82565498352
orig3: 5.67061495781
rewrite: 5.95284795761
rewriteop: 5.61896586418
rewriteuniq: 1.90719294548

代码:

import operator
from timeit import timeit

data = [
    'Option A',
    'Option B',
    'Blabla',
    'Some text',
]

desired_output = [1, 2, 0, 3]

class Item:
    def __init__(self, label):
        self.label = label

col = [Item(d) for d in data]


def orig():
    flt_neworder = [
        x[1] for x in sorted(
            zip(
                [x[0] for x in sorted(enumerate(col), key=lambda x: x[1].label)],
                range(len(col))
            )
        )
    ]

    assert flt_neworder == desired_output

def origmod():
    flt_neworder = [
        x[1] for x in sorted(
            zip(
                [x[0] for x in sorted(enumerate(data), key=lambda x: x[1])],
                range(len(data))
            )
        )
    ]

    assert flt_neworder == desired_output

def orignew():
    flt_neworder = list({k: i for i, (k, v) in enumerate(sorted(enumerate(data), key=operator.itemgetter(1)))}.values())
    assert flt_neworder == desired_output

def orig3():
    flt_neworder = [None] * len(col)
    for j, (_, i) in enumerate(sorted(zip((item.label for item in col), range(len(col))))): flt_neworder[i] = j

    assert flt_neworder == desired_output

def rewrite():
    idata = list(enumerate(data))
    sdata = sorted(idata, key=lambda x: x[1])
    flt_neworder = [sdata.index(x) for x in idata]

    assert flt_neworder == desired_output

def rewriteop():
    idata = list(enumerate(data))
    sdata = sorted(idata, key=operator.itemgetter(1))
    flt_neworder = [sdata.index(x) for x in idata]

    assert flt_neworder == desired_output

def rewriteuniq():
    sdata = sorted(data)
    flt_neworder = [sdata.index(x) for x in data]

    assert flt_neworder == desired_output

for fn in (orig, origmod, orignew, orig3, rewrite, rewriteop, rewriteuniq):
    print fn.__name__ + ':', timeit(fn)