在字符串中手动删除重复空格

时间:2015-05-28 21:12:06

标签: python string python-2.7

我对某段代码有疑问。我正在python中练习关于字符串的练习。我提出了正确的逻辑,但由于某种原因,for循环内的输出没有正确返回。相反,返回全局值。我对Python不太熟悉,但是有什么方法可以解决这个问题吗?

def song_decoder(song):
    global Ret

    Ret = ""

    Ret = song.replace("WUB", " ")
    Ret = Ret.strip()
    Ret += "1"

    space = False
    for i in range(0, len(Ret)):
        if Ret[i] == "1":
            Ret = Ret[:i]
            break
        elif Ret[i] == " ":
            if space is False:
                space = True
            else:
                if i+1 == len(Ret):
                    Ret = Ret[:i]
                else:
                    Ret = Ret[:i] + Ret[(i+1):]
        else:
            space = False

    return Ret

测试代码:

def test_song_decoder(self):
    self.assertEquals(song_decoder("AWUBBWUBC"), "A B C","WUB should be replaced by 1 space")
    self.assertEquals(song_decoder("AWUBWUBWUBBWUBWUBWUBC"), "A B C","multiples WUB should be replaced by only 1 space")
    self.assertEquals(song_decoder("WUBAWUBBWUBCWUB"), "A B C","heading or trailing spaces should be removed")

第二次测试失败,而是返回'A B C'

2 个答案:

答案 0 :(得分:8)

首先,您无需在此处Ret全球化。所以最好删除该行。

其次,缺少一个测试,会给你另一个提示:

>>> song_decoder("AWUBBWUBC")
'A B C'
>>> song_decoder("AWUBWUBBWUBWUBC")
'A B C'
>>> song_decoder("AWUBWUBWUBBWUBWUBWUBC")
'A  B  C'

如您所见,两个WUB只能被一个空格正确替换。当有三个时出现问题。这应该会提示您在进行替换后空间检测无法正常工作。其原因实际上相当简单:

# you iterate over the *initial* length of Ret
for i in range(0, len(Ret)):
    # ...
    elif Ret[i] == " ":
        if space is False:
            space = True
        else:
            # when you hit a space and you have seen a space directly
            # before then you check the next index…
            if i+1 == len(Ret):
                Ret = Ret[:i]
            else:
                # … and remove the next index from the string
                Ret = Ret[:i] + Ret[(i+1):]

    # now at the end of the loop, `i` is incremented to `i + 1`
    # although you have already removed the character at index `i`
    # making the next character you would have to check belong to
    # index `i` too

因此,结果是您跳过了第二个空格(您删除后)之后直接出现的字符。所以不可能以这种方式检测三个空格,因为你总是跳过第三个空格。

一般来说,在执行此操作时迭代您修改的内容是一个非常糟糕的主意。在您的情况下,您正在迭代字符串的长度,但字符串实际上一直变短。所以你真的应该避免这样做。

不应迭代Ret字符串,而应迭代保持不变的原始字符串:

def song_decoder(song):
    # replace the WUBs and strip spaces
    song = song.replace("WUB", " ").strip() 
    ret = ''

    space = False
    # instead of iterating over the length, just iterate over
    # the characters of the string
    for c in song:
        # since we iterate over the string, we don’t need to check
        # where it ends

        # check for spaces
        if c == " ":
            # space is a boolean, so don’t compare it against booleans
            if not space:
                space = True
            else:
                # if we saw a space before and this character is a space
                # we can just skip it
                continue
        else:
            space = False

        # if we got here, we didn’t skip a later space, so we should
        # include the current character
        ret += c

    return ret

答案 1 :(得分:2)

您尝试将多个空间合并为一个空间时会遇到太多麻烦:

def song_decoder(song, delimiter="WUB"):
    splits = song.split(delimiter)  # instead of replacing, just split on your delimiter
    cleaned = filter(None, splits)  # remove empty elements caused by consecutive WUBs
    return ' '.join(cleaned)        # join them up with a single space in between