我对如何检查和比较Python中列表中的两个或多个字符有一点疑问。
例如,我有一个字符串“cdcdccddd”。我从这个字符串中创建了一个列表,以便于比较字符。所需的输出是: c:1 d:1 c:1 d:1 c:2 d:3 所以它是对字符进行计数,如果第一个与第二个不相同,则计数器= 1,如果第二个与第三个相同,则计数器为+1,需要用第四个检查第三个,依此类推。
I got so far this algorithm:
text = "cdcdccddd"
l = []
l = list(text)
print list(text)
for n in range(0,len(l)):
le = len(l[n])
if l[n] == l[n+1]:
le += 1
if l[n+1] == l[n+2]:
le += 1
print l[n], ':' , le
else:
print l[n], ':', le
但它不能正常工作,因为它计算第一和第二个元素,但不计算第二个和第三个元素。对于此输出将是:
c : 1
d : 1
c : 1
d : 1
c : 2
c : 1
d : 3
如何使这个算法更好?
谢谢!
答案 0 :(得分:3)
您可以使用itertools.groupby:
from itertools import groupby
s = "cdcdccddd"
print([(k, sum(1 for _ in v)) for k,v in groupby(s)])
[('c', 1), ('d', 1), ('c', 1), ('d', 1), ('c', 2), ('d', 3)]
连续字符将被组合在一起,因此每个k
都是该组的字符,调用sum(1 for _ in v)
会给出每个组的长度,因此我们最终得到(char, len(group))
对。 / p>
如果我们在ipython中运行它并在每个v上调用list,那么应该非常清楚发生了什么:
In [3]: from itertools import groupby
In [4]: s = "cdcdccddd"
In [5]: [(k, list(v)) for k,v in groupby(s)]
Out[5]:
[('c', ['c']),
('d', ['d']),
('c', ['c']),
('d', ['d']),
('c', ['c', 'c']),
('d', ['d', 'd', 'd'])]
我们也可以轻松地推出自己的产品:
def my_groupby(s):
# create an iterator
it = iter(s)
# set consec_count, to one and pull first char from s
consec_count, prev = 1, next(it)
# iterate over the rest of the string
for ele in it:
# if last and current char are different
# yield previous char, consec_count and reset
if prev != ele:
yield prev,
consec_count, = 0
prev = ele
consec_count, += 1
yield ele, consec_count
这给了我们相同的东西:
In [8]: list(my_groupby(s))
Out[8]: [('c', 1), ('d', 1), ('c', 1), ('d', 1), ('c', 2), ('d', 3)]