寻找替代,更pythonic,迭代字符串上的多个字符的方法

时间:2013-11-22 17:29:18

标签: python loops iteration

如您所知,“选择”字符串的单位,双精度,三元组等的简单而可靠的方法是:

somestr = 'ABCDABCDABCDABCDABCDABCD'  
a = 0  
z = 3  
for i in somestr:  
    i = somestr[a:z]  
    # finally here i can work with with these 3 first characters of the somestr
    a += 1  # or 3 for non-overlapping
    z += 1  

所以我的问题是如何根据python的规则简化这段代码 我对重叠和非重叠的情况感兴趣。

5 个答案:

答案 0 :(得分:2)

Python的range函数包含一个步骤参数,因此对于最简单的情况,您可以这样做:

for i in range(0, len(somestring) - 3, 3):
    somestring[i:i+3]

您可以按如下方式创建generator function

def substring_generator(string, length, overlap=True):
    for i in range(0, len(string) - length + 1, 1 if overlap else length):
        yield string[i:i+length]

并在两种情况下使用它:

>>> print([x for x in substring_generator("ABCDEFG", 3, True)])
['ABC', 'BCD', 'CDE', 'DEF', 'EFG']
>>> print([x for x in substring_generator("ABCDEFG", 3, False)])
['ABC', 'DEF']

答案 1 :(得分:2)

正则表达式可以很容易地完成这项工作:

>>> from re import findall
>>> somestr = 'ABCDABCDABCDABCDABCDABCD'
>>> # no overlapping
>>> for i in findall(".{3}", somestr):
...     print(i)
...
ABC
DAB
CDA
BCD
ABC
DAB
CDA
BCD
>>> # overlapping
>>> for i in findall("(?=(.{3}))", somestr):
...     print(i)
...
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
>>>

请注意,我已将其设置为使用3组。你可以选择任何数字。

答案 2 :(得分:1)

您可以使用itertools

import itertools

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

这允许您:

>>> [''.join(item) for item in grouper(somestr, 3)]
['ABC', 'DAB', 'CDA', 'BCD', 'ABC', 'DAB', 'CDA', 'BCD']
>>> [''.join(item) for item in grouper(somestr, 4)]
['ABCD', 'ABCD', 'ABCD', 'ABCD', 'ABCD', 'ABCD']

请注意,当最后一个字符串没有足够的字符时,您需要fillvalue

>>> [''.join(item) for item in grouper(somestr, 5, fillvalue='')]
['ABCDA', 'BCDAB', 'CDABC', 'DABCD', 'ABCD']

答案 3 :(得分:1)

处理数据块的最简单方法是itertools.izip

from itertools import izip

def chunks(iterable, size=2):
  it = iter(iterable)
  return izip(*[it]*size)

答案 4 :(得分:1)

how_many = 3
every_other = 3
three_at_a_time_skipping_three = [somestr[somestr.index(x):somestr.index(x)+how_many] for x in somestr[::every_other]]
#this pretty much means from where you started till 'how many' you want each time, while incrementing starting point by 'every_other'
print(three_at_a_time_skipping_three)
['ABC', 'DAB', 'CDA', 'BCD', 'ABC', 'DAB', 'CDA', 'BCD']

how_many = 4
four_at_a_time_skipping_three = [somestr[somestr.index(x):somestr.index(x)+how_many] for x in somestr[::every_other]]
print(four_at_a_time_skipping_three)
['ABCD', 'DABC', 'CDAB', 'BCDA', 'ABCD', 'DABC', 'CDAB', 'BCDA']

调整how_manyevery_other将为您提供各种结果。

这是超级丑陋和超级难以理解的,但它的一般要点是,它使用somestr中的切片使用其迭代的项目的位置。 [::every_other]告诉它在somestr中跳过那么多。