如您所知,“选择”字符串的单位,双精度,三元组等的简单而可靠的方法是:
somestr = 'ABCDABCDABCDABCDABCDABCD'
a = 0
z = 3
for i in somestr:
i = somestr[a:z]
# finally here i can work with with these 3 first characters of the somestr
a += 1 # or 3 for non-overlapping
z += 1
所以我的问题是如何根据python的规则简化这段代码 我对重叠和非重叠的情况感兴趣。
答案 0 :(得分:2)
Python的range
函数包含一个步骤参数,因此对于最简单的情况,您可以这样做:
for i in range(0, len(somestring) - 3, 3):
somestring[i:i+3]
您可以按如下方式创建generator function:
def substring_generator(string, length, overlap=True):
for i in range(0, len(string) - length + 1, 1 if overlap else length):
yield string[i:i+length]
并在两种情况下使用它:
>>> print([x for x in substring_generator("ABCDEFG", 3, True)])
['ABC', 'BCD', 'CDE', 'DEF', 'EFG']
>>> print([x for x in substring_generator("ABCDEFG", 3, False)])
['ABC', 'DEF']
答案 1 :(得分:2)
正则表达式可以很容易地完成这项工作:
>>> from re import findall
>>> somestr = 'ABCDABCDABCDABCDABCDABCD'
>>> # no overlapping
>>> for i in findall(".{3}", somestr):
... print(i)
...
ABC
DAB
CDA
BCD
ABC
DAB
CDA
BCD
>>> # overlapping
>>> for i in findall("(?=(.{3}))", somestr):
... print(i)
...
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
>>>
请注意,我已将其设置为使用3
组。你可以选择任何数字。
答案 2 :(得分:1)
您可以使用itertools
:
import itertools
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return itertools.izip_longest(fillvalue=fillvalue, *args)
这允许您:
>>> [''.join(item) for item in grouper(somestr, 3)]
['ABC', 'DAB', 'CDA', 'BCD', 'ABC', 'DAB', 'CDA', 'BCD']
>>> [''.join(item) for item in grouper(somestr, 4)]
['ABCD', 'ABCD', 'ABCD', 'ABCD', 'ABCD', 'ABCD']
请注意,当最后一个字符串没有足够的字符时,您需要fillvalue
:
>>> [''.join(item) for item in grouper(somestr, 5, fillvalue='')]
['ABCDA', 'BCDAB', 'CDABC', 'DABCD', 'ABCD']
答案 3 :(得分:1)
处理数据块的最简单方法是itertools.izip
:
from itertools import izip
def chunks(iterable, size=2):
it = iter(iterable)
return izip(*[it]*size)
答案 4 :(得分:1)
how_many = 3
every_other = 3
three_at_a_time_skipping_three = [somestr[somestr.index(x):somestr.index(x)+how_many] for x in somestr[::every_other]]
#this pretty much means from where you started till 'how many' you want each time, while incrementing starting point by 'every_other'
print(three_at_a_time_skipping_three)
['ABC', 'DAB', 'CDA', 'BCD', 'ABC', 'DAB', 'CDA', 'BCD']
how_many = 4
four_at_a_time_skipping_three = [somestr[somestr.index(x):somestr.index(x)+how_many] for x in somestr[::every_other]]
print(four_at_a_time_skipping_three)
['ABCD', 'DABC', 'CDAB', 'BCDA', 'ABCD', 'DABC', 'CDAB', 'BCDA']
调整how_many
和every_other
将为您提供各种结果。
这是超级丑陋和超级难以理解的,但它的一般要点是,它使用somestr
中的切片使用其迭代的项目的位置。 [::every_other]
告诉它在somestr
中跳过那么多。