Question

我想在一次调用中根据标题拆分字符串。我正在寻找使用列表理解的简单语法，但我还没有得到它：

s = "123456"

结果将是：

["12", "34", "56"]

我不想要的东西：

re.split('(?i)([0-9a-f]{2})', s)
s[0:2], s[2:4], s[4:6]
[s[i*2:i*2+2] for i in len(s) / 2]

修改：

好的，我想解析十六进制RGB [A]颜色（以及可能的其他颜色/组件格式），以提取所有组件。似乎最快的方法是来自sven-marnach的最后一种方法：

sven-marnach xrange：每循环0.883 usec

python -m timeit -s 's="aabbcc";' '[int(s[i:i+2], 16) / 255. for i in xrange(0, len(s), 2)]'

pair / iter：每个循环1.38 usec

python -m timeit -s 's="aabbcc"' '["%c%c" % pair for pair in zip(* 2 * [iter(s)])]'

正则表达式：每循环使用2.55次

python -m timeit -s 'import re; s="aabbcc"; c=re.compile("(?i)([0-9a-f]{2})"); 
split=re.split' '[int(x, 16) / 255. for x in split(c, s) if x != ""]'

Answer 1

通过阅读评论，结果表明实际问题是：以十六进制RRGGBBAA格式解析颜色定义字符串的最快方法是什么。以下是一些选项：

def rgba1(s, unpack=struct.unpack):
    return unpack("BBBB", s.decode("hex"))

def rgba2(s, int=int, xrange=xrange):
    return [int(s[i:i+2], 16) for i in xrange(0, 8, 2)]

def rgba3(s, int=int, xrange=xrange):
    x = int(s, 16)
    return [(x >> i) & 255 for i in xrange(0, 32, 8)]

正如我所料，第一个版本最快：

In [6]: timeit rgba1("aabbccdd")
1000000 loops, best of 3: 1.44 us per loop

In [7]: timeit rgba2("aabbccdd")
100000 loops, best of 3: 2.43 us per loop

In [8]: timeit rgba3("aabbccdd")
100000 loops, best of 3: 2.44 us per loop

Answer 2

In [4]: ["".join(pair) for pair in zip(* 2 * [iter(s)])]
Out[4]: ['aa', 'bb', 'cc']

请参阅：How does zip(*[iter(s)]*n) work in Python?，了解有关奇怪的“2 - iter相同str”语法的解释。

您在评论中表示您希望“执行速度最快”，我不能向您承诺使用此实现，但您可以使用timeit 衡量执行。当然，请记住what Donald Knuth said about premature optimisation。对于手头的问题（现在你已经透露了它），我认为你会发现r, g, b = s[0:2], s[2:4], s[4:6]很难被击败。

$ python3.2 -m timeit -c '
s = "aabbcc"
["".join(pair) for pair in zip(* 2 * [iter(s)])]
'
100000 loops, best of 3: 4.49 usec per loop

比照

python3.2 -m timeit -c '
s = "aabbcc"
r, g, b = s[0:2], s[2:4], s[4:6]
'
1000000 loops, best of 3: 1.2 usec per loop

Answer 3

Numpy比单个查找的首选解决方案更糟糕：

$ python -m timeit -s 'import numpy as np; s="aabbccdd"' 'a = np.fromstring(s.decode("hex"), dtype="uint32"); a.dtype = "uint8"; list(a)'
100000 loops, best of 3: 5.14 usec per loop
$ python -m timeit -s 's="aabbcc";' '[int(s[i:i+2], 16) / 255. for i in xrange(0, len(s), 2)]'
100000 loops, best of 3: 2.41 usec per loop

但如果你一次做几次转换，numpy要快得多：

$ python -m timeit -s 'import numpy as np; s="aabbccdd" * 100' 'a = np.fromstring(s.decode("hex"), dtype="uint32"); a.dtype = "uint8"; a.tolist()'
10000 loops, best of 3: 59.6 usec per loop
$ python -m timeit -s 's="aabbccdd" * 100;' '[int(s[i:i+2], 16) / 255. for i in xrange(0, len(s), 2)]'
1000 loops, best of 3: 240 usec per loop

在我的计算机上，对于大于2的批处理程序，Numpy更快。您可以通过将a.shape设置为(number_of_colors, 4)来轻松对值进行分组，但这会使tolist方法的速度降低50％。

事实上，大部分时间都花在将数组转换为列表上。根据您对结果的要求，您可以跳过这个中间步骤，并获得一些好处：

$ python -m timeit -s 'import numpy as np; s="aabbccdd" * 100' 'a = np.fromstring(s.decode("hex"), dtype="uint32"); a.dtype = "uint8"; a.shape = (100,4)'
100000 loops, best of 3: 6.76 usec per loop

拆分字符串“aabbcc” - ＆gt; [“aa”，“bb”，“cc”]没有re.split

3 个答案: