在Python中查找字符串中最长运行的从零开始的索引的函数

时间:2015-02-23 11:46:14

标签: python list function indexing

我试图编写一个函数来查找字符串中最长运行的从零开始的索引。如果有多个具有相同长度的运行,则代码应返回第一个的索引。

a=["a","b","b","c","c","c","d","d","d","d","c","c","c","b","b","a"]

def longestrun(myList):
    result = None
    prev = None
    size = 0
    max_size = 0


    for i in myList:
        if i == prev:
            print (i)
            size += 1
            if size > max_size:
                print ('*******  '+ str(max_size))
                max_size = size 
        else:
            size = 0
        prev = i
    print (max_size+1)    
    return max_size+1


longestrun(a)

我做了一些研究,发现这个代码我认为可以用来找到我列表中最长的一段时间,但我不知道如何使用它来查找最长运行中第一个字母的索引。任何人都可以帮助我或给我一些如何做到这一点的建议吗?总的来说,程序运行时的输出应该产生数字6作为第一个' d'在索引6处,是最长的运行。

请注意我是初学者,所以如果答案尽可能简单并且解释清楚,我们将不胜感激。

5 个答案:

答案 0 :(得分:2)

这应该没问题:

def longestrun(myList):
    prev = None
    size = 0
    max_size = 0
    curr_pos = 0
    max_pos = 0

    for (index, i) in enumerate(myList):
        if i == prev:
            size += 1
            if size > max_size:
                max_size = size 
                max_pos = curr_pos
        else:
            size = 0
            curr_pos = index
        prev = i
    return max_pos

答案 1 :(得分:1)

如果您想要最长字符串的起始索引:

from operator import itemgetter
def longest(l):
    od = defaultdict(int)
    prev = None
    out = []
    for ind, ele in enumerate(l):
        if ele != prev and prev in od:
            out.append((ind, prev, od[prev]))
            od[prev] = 0
        od[ele] += 1
        prev = ele
    best = max(out, key=itemgetter(2)) # max by sequence length
    return best[0] - best[2] # deduct last index from length to get start
print(longest(a))

我存储了所有的密钥和长度,以防您真正了解所有信息。

没有进口:

def longest1(l):
    prev = None
    seq = 0 
    best = 0
    indx = None 
    for ind, ele in enumerate(l):
        if ele != prev: # if we have a new char we have a new sequence
             # if current seq len is greater than our current best 
            if seq > best: 
                # update best to current len and set index to start of the sequence
                best = seq
                indx  = ind - seq
            seq = 0 # reset seq count
        seq += 1
        prev = ele
    return indx 
print(longest(a))

有些时间表明简单的循环实际上是最有效的:

In [23]: timeit longestrun_index(a)
100000 loops, best of 3: 9.07 µs per loop

In [24]: timeit longestrun(a)
100000 loops, best of 3: 2.54 µs per loop

In [25]: timeit longest(a)
100000 loops, best of 3: 6.79 µs per loop

In [26]: timeit longest1(a)
100000 loops, best of 3: 3.06 µs per loop

答案 2 :(得分:1)

您可以将itertools.groupby()max()enumerate()一起用于此:

from itertools import groupby
from operator import itemgetter

def longestrun_index(seq):
    groups = ((next(g), sum(1 for _ in g)+1) for k, g in groupby(enumerate(seq),
                                                             key=itemgetter(1)))
    (index, item), length = max(groups, key=itemgetter(1))
    return index

a = ["a","b","b","c","c","c","d","d","d","d","c","c","c","b","b","a"]    
print (longestrun_index(a))
# 6

这是如何运作的?

  • 我们首先使用itertools.groupbyenumerate(a)制作相似项目组。但是,由于enumerate(a)将从列表a返回索引以及项目((索引,项目)元组),我们需要告诉groupby使用项目对内容进行分组,我在operator.itemgetter(1)中使用了groupby()
  • 现在groupby()返回两个项目,我们用于分组的项目关键项目以及迭代器形式的组。现在我们可以通过调用迭代器上的next来使用此迭代器(组)来获取第一个项目以及索引,然后使用sum()获取此组中存在的所有项目的总计数生成器表达式:sum(1 for _ in g)+1。我们之前使用next()来补偿我们已从该群组中提取的项目。

  • 使用索引,键和计数我们现在有了生成器,它将在迭代时产生((index, key), length)

  • 现在我们可以再次使用带有itemgetter的内置函数max()来指定要用于比较的项目(length此处)并找到所需的索引。

答案 3 :(得分:0)

您可以使用itertools.groupby获取运行列表,然后您只需找到最大运行并总计所有先前运行的长度:

from itertools import groupby

a = ["a","b","b","c","c","c","d","d","d","d","c","c","c","b","b","a"]

# Get list of runs, each in the form (character, length)
runs = [(x, len(list(y))) for x,y in groupby(a)]

# Identify longest run
maxrun = max(runs, key=lambda x: x[1])

# Sum length of all runs before the max
index = 0
for run in runs:
    if run == maxrun: break
    index += run[1]

print(index)

答案 4 :(得分:-1)

使用defaultdict创建一个包含每个项目计数的字典,然后找到具有最高值的键,然后找到该项目的第一个匹配项。

from collections import defaultdict
import operator

letters=["a","b","b","c","c","c","d","d","d","d","c","c","c","b","b","a"]

d = defaultdict(int)
for letter in letters:
    d[letter] += 1

highest_run = max(d.iteritems(), key=operator.itemgetter(1))[0]

z_index =''.join(letters).find(highest_run)
print z_index

使用模块的好处是简化和开发效率;再加上维护良好且经过良好测试的代码,“站在巨人的肩膀上”的效果。这并不是说在使用模块检查它们是否维护良好并且进行单元测试时你不应该小心。