Question

是的，这是功课。我只是想了解为什么这似乎不起作用。

我试图按字母顺序查找字符串中最长的子字符串。我制作一个随机字母列表，并说长度是19.当我运行我的代码时，它打印出索引0到17.（我知道这是因为我从范围中减去1）但是，当我离开时 - 1，它告诉我＆＃34;字符串索引超出范围。＆＃34;为什么会这样？

s = 'cntniymrmbhfinjttbiuqhib'
sub = ''
longest = []

for i in range(len(s) - 1):
    if s[i] <= s[i+1]:
        sub += s[i]
        longest.append(sub)
    elif s[i-1] <= s[i]:
        sub += s[i]
        longest.append(sub)
        sub = ' '
    else:
        sub = ' '
print(longest)
print ('Longest substring in alphabetical order is: ' + max(longest, key=len))

我还尝试了其他一些方法

如果我只是说：

for i in s:

它会抛出一个错误，说＆＃34;字符串索引必须是整数，而不是str。＆＃34;这似乎是一种迭代字符串的简单方法，但我如何以这种方式比较单个字母呢？

顺便说一下，这是Python 2.7。

编辑：我确定我的if / elif语句可以改进，但这是我能想到的第一件事。如果需要，我可以稍后回来。

Answer 1

问题是行if s[i] <= s[i+1]:。如果是i=18（循环的最后一次迭代，其中没有-1）。然后i+1=19超出范围。

请注意，行elif s[i-1] <= s[i]:也可能没有按照您的意愿行事。当i=0我们i-1 = -1时。 Python允许负索引表示从索引对象的后面开始计数，因此s[-1]是列表中的 last 字符（s [-2]将是倒数第二个等。）< / p>

获取上一个和下一个字符的一种更简单的方法是使用zip，同时将字符串切片分别从第一个和第二个字符开始计数。

zip如果您以前没有看过它，就会这样：

>>> for char, x in zip(['a','b','c'], [1,2,3,4]):
>>>    print char, x
'a' 1
'b' 2
'c' 3

所以你可以这样做：

for previous_char, char, next_char in zip(string, string[1:], string[2:]):

迭代所有三个字符而不会弄乱两端。

然而，有一种更简单的方法可以做到这一点。您不应将字符串中的当前字符与字符串中的其他字符进行比较，而应将其与当前字母字符串中的最后一个字符进行比较，例如：

s = "abcdabcdefa"
longest = [s[0]]
current = [s[0]]
for char in s[1:]:
    if char >= current[-1]: # current[-1] == current[len(current)-1]
        current.append(char)
    else:            
        current=[char]
    if len(longest) < len(current):
        longest = current
print longest

这避免了必须进行任何花哨的索引。

Answer 2

我确定我的if / elif语句可以改进，但这是第一个我能想到的事情。如果需要，我可以稍后回来。

@ or1426的解决方案创建了当前最长的排序序列列表，并在找到更长的序列时将其复制到longest。每次找到更长的序列时，这会创建一个新列表，并附加到每个字符的列表中。这在Python中实际上非常快，但请参见下文。

@ Deej的解决方案将当前最长的排序序列保存在字符串变量中，并且每次找到更长的子字符串时（即使它是当前序列的延续），子字符串也会保存到列表中。该列表最终具有原始字符串的所有排序子字符串，并且通过调用max找到最长的字符串。

这是一个更快的解决方案，它只跟踪当前最大序列的索引，并且只在找到不按排序顺序排列的字符时才进行最长的更改：

def bjorn4(s):
    # we start out with s[0] being the longest sorted substring (LSS)
    longest = (0, 1)    # the slice-indices of the longest sorted substring
    longlen = 1         # the length of longest
    cur_start = 0       # the slice-indices of the *current* LSS
    cur_stop = 1

    for ch in s[1:]:       # skip the first ch since we handled it above
        end = cur_stop-1   # cur_stop is a slice index, subtract one to get the last ch in the LSS
        if ch >= s[end]:   # if ch >= then we're still in sorted order..
            cur_stop += 1  # just extend the current LSS by one
        else:
            # we found a ch that is not in sorted order
            if longlen < (cur_stop-cur_start):
                # if the current LSS is longer than longest, then..
                longest = (cur_start, cur_stop)    # store current in longest
                longlen = longest[1] - longest[0]  # precompute longlen

            # since we can't add ch to the current LSS we must create a new current around ch
            cur_start, cur_stop = cur_stop, cur_stop+1

    # if the LSS is at the end, then we'll not enter the else part above, so
    # check for it after the for loop
    if longlen < (cur_stop - cur_start):
        longest = (cur_start, cur_stop)

    return s[longest[0]:longest[1]]

快多少？它几乎是orl1426的两倍，比deej快三倍。一如既往，这取决于您的输入。存在的排序子串越多，上述算法与其他算法相比就越快。例如。在一个长度为100000的输入字符串中，包含交替的100个随机字符和100个有序字符，我得到：

bjorn4: 2.4350001812
or1426: 3.84699988365
deej  : 7.13800001144

如果我将它改为1000个随机字符和1000个排序字符，那么我得到：

bjorn4: 23.129999876
or1426: 38.8380000591
deej  : MemoryError

<强>更新这是我算法的进一步优化版本，带有比较代码：

import random, string
from itertools import izip_longest
import timeit

def _randstr(n):
    ls = []
    for i in range(n):
        ls.append(random.choice(string.lowercase))
    return ''.join(ls)

def _sortstr(n):
    return ''.join(sorted(_randstr(n)))

def badstr(nish):
    res = ""
    for i in range(nish):
        res += _sortstr(i)
        if len(res) >= nish:
            break
    return res

def achampion(s):
    start = end = longest = 0
    best = ""
    for c1, c2 in izip_longest(s, s[1:]):
        end += 1
        if c2 and c1 <= c2:
            continue
        if (end-start) > longest:
            longest = end - start
            best = s[start:end]
        start = end
    return best

def bjorn(s):
    cur_start = 0
    cur_stop = 1
    long_start = cur_start
    long_end = cur_stop

    for ch in s[1:]:      
        if ch < s[cur_stop-1]:
            if (long_end-long_start) < (cur_stop-cur_start):
                long_start = cur_start
                long_end = cur_stop
            cur_start = cur_stop
        cur_stop += 1

    if (long_end-long_start) < (cur_stop-cur_start):
        return s[cur_start:cur_stop]
    return s[long_start:long_end]


def or1426(s):
    longest = [s[0]]
    current = [s[0]]
    for char in s[1:]:
        if char >= current[-1]: # current[-1] == current[len(current)-1]
            current.append(char)
        else:            
            current=[char]
        if len(longest) < len(current):
            longest = current
    return ''.join(longest)

if __name__ == "__main__":
    print 'achampion:', round(min(timeit.Timer(
        "achampion(rstr)",
        setup="gc.enable();from __main__ import achampion, badstr; rstr=badstr(30000)"
    ).repeat(15, 50)), 3)

    print 'bjorn:', round(min(timeit.Timer(
        "bjorn(rstr)",
        setup="gc.enable();from __main__ import bjorn, badstr; rstr=badstr(30000)"
    ).repeat(15, 50)), 3)

    print 'or1426:', round(min(timeit.Timer(
        "or1426(rstr)",
        setup="gc.enable();from __main__ import or1426, badstr; rstr=badstr(30000)"
    ).repeat(15, 50)), 3)

输出：

achampion: 0.274
bjorn: 0.253
or1426: 0.486

将数据更改为随机：

achampion: 0.350
bjorn: 0.337
or1426: 0.565

并排序：

achampion: 0.262
bjorn: 0.245
or1426: 0.503

“不，不，它没死，它正在休息”

Answer 3

现在Deej有一个答案我觉得在回答家庭作业时感觉更舒服只需重新排序@ Deej的逻辑，你就可以简化为：

sub = ''
longest = []
for i in range(len(s)-1):  # -1 simplifies the if condition
    sub += s[i]
    if s[i] <= s[i+1]:
        continue           # Keep adding to sub until condition fails
    longest.append(sub)    # Only add to longest when condition fails
    sub = ''

max(longest, key=len)

但正如@thebjorn所提到的，这就是将每个升序分区保留在列表中（在内存中）的问题。你可以通过使用一个生成器解决这个问题，我只把其余部分用于教学目的：

def alpha_partition(s):
    sub = ''
    for i in range(len(s)-1):
        sub += s[i]
        if s[i] <= s[i+1]:
            continue
        yield sub
        sub = ''

max(alpha_partition(s), key=len)

这肯定不是最快的解决方案（字符串构造和索引），但更改非常简单，使用zip来避免索引到字符串和索引以避免字符串构造和添加：

from itertools import izip_longest   # For py3.X use zip_longest
def alpha_partition(s):
    start = end = 0
    for c1, c2 in izip_longest(s, s[1:]):
        end += 1
        if c2 and c1 <= c2:
            continue
        yield s[start:end]
        start = end

max(alpha_partition(s), key=len)

由于生成器开销，它应该非常有效地运行，并且只比@thebjorn的迭代索引方法稍慢。

使用s * 100
alpha_partition()：1000循环，最佳3：448μs/循环
@thebjorn：1000次循环，最佳3：每循环389μs

供参考，将发生器转换为迭代函数：

from itertools import izip_longest   # For py3.X use zip_longest
def best_alpha_partition(s):
    start = end = longest = 0
    best = ""
    for c1, c2 in izip_longest(s, s[1:]):
        end += 1
        if c2 and c1 <= c2:
            continue
        if (end-start) > longest:
            longest = end - start
            best = s[start:end]
        start = end
    return best
best_alpha_partition(s)

best_alpha_partition()：1000次循环，最佳3次：每次循环306μs

我个人更喜欢生成器形式，因为你会使用完全相同的生成器来查找最小值，前5个等等，而且迭代函数只能做一件事。

Answer 4

好的，所以在阅读了你的回答并尝试了各种各样的事情之后，我终于想出了一个能够得到我需要的解决方案。这不是最漂亮的代码，但它有效。我确信所提到的解决方案也会起作用，但是我无法理解它们。这是我做的：

s = 'inaciaebganawfiaefc'
sub = ''
longest = []
for i in range(len(s)):
    if (i+1) < len(s) and s[i] <= s[i+1]:
        sub += s[i]
        longest.append(sub)
    elif i >= 0 and s[i-1] <= s[i]:
        sub += s[i]
        longest.append(sub)
        sub = ''
    else:
        sub = ''
print ('Longest substring in alphabetical order is: ' + max(longest, key=len))

为什么赢得我的for循环工作？（蟒蛇）

4 个答案:

为什么赢得我的for循环工作？ （蟒蛇）

4 个答案:

为什么赢得我的for循环工作？（蟒蛇）