解决方案解释：

Question

我在Think Python中尝试了chp 10.15并编写了以下代码：

def turn_str_to_list(string):
    res = []
    for letter in string:
        res.append(letter)
    return res

def sort_and_unique (t):
    t.sort()
    for i in range (0, len(t)-2, 1):
        for j in range (i+1, len(t)-1, 1):
            if t[i]==t[j]:
                del t[j]
    return t

line=raw_input('>>>')
t=turn_str_to_list(line)
print t
print sort_and_unique(t)

我用了一个双＆＃39;用于＆＃39;结构，以消除排序列表中的任何重复元素。但是，当我运行它时，我一直得到错误的输出。如果我输入＆＃39;委员会＆＃39;，则输出是[＆＃39; c＆＃39;＆＃39; e＆＃39;，＆＃39; i＆＃39;＆＃39; m＆＃39 ;，＆＃39; o＆＃39;，＆＃39;，＆＃39; t＆＃39;]，这是错误的，因为它仍然包含双重＆＃39;。我尝试了不同的输入，有时程序无法在列表中间拾取重复的字母，并且总是无法在最后拾取那些字母。我错过了什么？谢谢你们。

Answer 1

你的程序没有删除所有重复字母的原因是因为在嵌套的for循环中使用del t[j]会导致程序跳过字母。

我添加了一些印刷品来帮助说明这一点：

def sort_and_unique (t):
    t.sort()
    for i in range (0, len(t)-2, 1):
        print "i: %d" % i
        print t
        for j in range (i+1, len(t)-1, 1):
            print "\t%d %s len(t):%d" % (j, t[j], len(t))
            if t[i]==t[j]:
                print "\tdeleting %c" % t[j]
                del t[j]
    return t

输出：

>>>committee
['c', 'o', 'm', 'm', 'i', 't', 't', 'e', 'e']
i: 0
['c', 'e', 'e', 'i', 'm', 'm', 'o', 't', 't']
        1 e len(t):9
        2 e len(t):9
        3 i len(t):9
        4 m len(t):9
        5 m len(t):9
        6 o len(t):9
        7 t len(t):9
i: 1
['c', 'e', 'e', 'i', 'm', 'm', 'o', 't', 't']
        2 e len(t):9
        deleting e
        3 m len(t):8
        4 m len(t):8
        5 o len(t):8
        6 t len(t):8
        7 t len(t):8
i: 2
['c', 'e', 'i', 'm', 'm', 'o', 't', 't']
        3 m len(t):8
        4 m len(t):8
        5 o len(t):8
        6 t len(t):8
i: 3
['c', 'e', 'i', 'm', 'm', 'o', 't', 't']
        4 m len(t):8
        deleting m
        5 t len(t):7
        6 t len(t):7
i: 4
['c', 'e', 'i', 'm', 'o', 't', 't']
        5 t len(t):7
i: 5
['c', 'e', 'i', 'm', 'o', 't', 't']
i: 6
['c', 'e', 'i', 'm', 'o', 't', 't']
['c', 'e', 'i', 'm', 'o', 't', 't']

每当调用del t[j]时，列表变为一个较小的元素，但内部j变量for-loops继续迭代。

例如：

i=1, j=2, t = ['c', 'e', 'e', 'i', 'm', 'm', 'o', 't', 't']

它看到t [1] == t [2]（都是'e'）所以它删除了t [2]。

现在t = ['c', 'e', 'i', 'm', 'm', 'o', 't', 't']

但是，代码会继续i=1，j=3，将'e'与'm'进行比较并跳过'i'。

最后，它没有抓住最后两个't'，因为到时间i=5，len(t)是7，所以内部for循环的条件是{{1}并且没有被执行。

Answer 2

在python中，您可以使用内置的数据结构和库函数，如set()＆amp; list()

您的turn_str_to_list()可以使用list()完成。也许你知道这个，但想自己做。

使用list（）和set() API：

line=raw_input('>>>')
print list(set(line))

您的sort_and_unique()的复杂度为O（n ^ 2）。制作清洁工的方法之一：

def sort_and_unique2(t):
    t.sort()
    res = []
    for i in t:
        if i not in res:
            res.append(i)

    return res

这仍然是O（n ^ 2），因为查找（我不在res中）将是线性时间，但代码看起来更清晰。删除具有复杂度O（n），因此您可以添加到新列表，因为append是O（1）。有关列表API的复杂性，请参阅此处：https://wiki.python.org/moin/TimeComplexity

Answer 3

你走了：

In [1]: word = 'committee'

In [3]: word_ = set(word)

In [4]: word_
Out[4]: {'c', 'e', 'i', 'm', 'o', 't'}

在python中检查唯一元素的标准方法是使用一个集合。集合takes any sequential object的构造函数。字符串是顺序ascii代码（或unicode代码点）的集合，因此它符合条件。

如果您还有其他问题，请发表评论。

Answer 4

解决方案解释：

>>> word = "committee"

将字符串转换为字符列表：

>>> clst = list(word)
>>> clst
['c', 'o', 'm', 'm', 'i', 't', 't', 'e', 'e']

使用set仅获取唯一商品：

>>> unq_clst = set(clst)
>>> unq_clst
{'c', 'e', 'i', 'm', 'o', 't'}

事实证明（感谢Blckknght），list步骤不是必需的，我们可以这样做：

>>> unq_clst = set(word)
{'c', 'e', 'i', 'm', 'o', 't'}

set和list都将参数作为可迭代参数，迭代字符串将一个字符返回另一个字符。

排序：

>>> sorted(unq_clst)
['c', 'e', 'i', 'm', 'o', 't']

一行版本：

>>> sorted(set("COMMITTEE"))
['C', 'E', 'I', 'M', 'O', 'T']

Answer 5

您可以尝试以下代码段

s = "committe"
res = sorted((set(list(s))))

Answer 6

所以你想解释一下，你的代码有什么问题。你在这里：

在我们深入编码之前，制作测试用例

如果我们从一开始就得到测试案例

，这将使我们的编码更快

为了进行测试，我将制作小实用功能：

def textinout(text):
    return "".join(sort_and_unique(list(text)))

这允许快速测试，如：

>>> textinout("committee")
"ceimot"

和另一个可读错误跟踪的辅助函数：

def checkit(textin, expected):
    msg = "For input '{textin}' we expect '{expected}', got '{result}'"
    result = textinout(textin)
    assert result == expected, msg.format(textin=textin, expected=expected, result=result)

并使测试用例功能：

def testit():
    checkit("abcd", 'abcd')
    checkit("aabbccdd", 'abcd')
    checkit("a", 'a')
    checkit("ddccbbaa", 'abcd')
    checkit("ddcbaa", 'abcd')
    checkit("committee", 'ceimot')

让我们用现有的功能进行第一次测试：

def sort_and_unique (t):
    t.sort()
    for i in range (0, len(t)-2, 1):
        for j in range (i+1, len(t)-1, 1):
            if t[i]==t[j]:
                del t[j]
    return t

现在我们可以测试一下：

testit()
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-11-291a15d81032> in <module>()
----> 1 testit()

<ipython-input-4-d8ad9abb3338> in testit()
      1 def testit():
      2         checkit("abcd", 'abcd')
----> 3         checkit("aabbccdd", 'abcd')
      4         checkit("a", 'a')
      5         checkit("ddccbbaa", 'abcd')

<ipython-input-10-620ac3b14f51> in checkit(textin, expected)
      2     msg = "For input '{textin}' we expect '{expected}', got '{result}'"
      3     result = textinout(textin)
----> 4     assert result == expected, msg.format(textin=textin, expected=expected, result=result)

AssertionError: For input 'aabbccdd' we expect 'abcd', got 'abcdd'

阅读我们知道的错误跟踪的最后一行，出了什么问题。

对您的代码的一般评论

通过索引

访问列表成员

在大多数情况下，这样做效率不高，而且会使代码难以阅读。

而不是：

lst = ["a", "b", "c"]
for i in range(len(lst)):
    itm = lst[i]
    # do something with the itm

您应该使用：

lst = ["a", "b", "c"]
for itm in lst:
    # do something with the itm
    print itm

如果您需要访问列表的子集，请使用切片

而不是：

for i in range (0, len(lst)-2, 1):
    itm = lst[i]

使用：

for itm in lst[:-2]:
    # do something with the itm
    print itm

如果您确实需要知道内部循环的已处理项目的位置，请使用enumerate：

而不是：

lst = ["a", "b", "c", "d", "e"]
for i in range(0, len(lst)):
    for j in range (i+1, len(lst)-1, 1):
        itm_i = lst[i]
        itm_j = lst[j]
        # do something

使用enumerate，将每个列表项转换为元组（index，item）：

lst = ["a", "b", "c", "d", "e"]
for i, itm_i in enumerate(lst):
    for itm_j in lst[i+1, -1]
        print itm_i, itm_j
        # do something

操作已处理的列表

您正在循环查看列表并突然从中删除项目。迭代期间的列表修改通常更好地避免，如果必须这样做，则必须这样做三思而后行，就像向后迭代一样，这样你就不会修改那个部分了即将在下一次迭代中处理。

作为从迭代列表中删除项目的替代方法，您可以将结果（如重复项目）记录到另一个列表中在你离开循环后，以某种方式使用它。

如何重写您的代码

def sort_and_unique (lst):
    lst.sort()
    to_remove = []
    for i, itm_i in enumerate(lst[:-2]):
        for j, itm_j in enumerate(lst[i+1: -1]):
            if itm_i == itm_j:
                to_remove.append(itm_j)
    # now we are out of loop and can modify the lst
    # note, we loop over one list and modify another, this is safe
    for itm in to_remove:
        lst.remove(itm)
    return lst

阅读代码，问题结果是：你从不触及排序列表中的最后一项。这就是为什么你没有删除"t"，因为它按字母顺序排在最后一项之后应用排序。

所以你的代码可以这样纠正：

def sort_and_unique (lst):
    lst.sort()
    to_remove = []
    for i, itm_i in enumerate(lst[:-1]):
        for j, itm_j in enumerate(lst[i+1:]):
            if itm_i == itm_j:
                to_remove.append(itm_j)
    for itm in to_remove:
        lst.remove(itm)
    return lst

从现在开始，代码是正确的，您应该通过调用testit()

来证明它

>>> testit()

无声测试输出是我们梦寐以求的。

使用测试功能可以轻松进行进一步的代码修改，因为如果事情仍按预期工作，将很快检查。

无论如何，使用(itm_i, itm_j)

获取元组zip可以缩短代码

def sort_and_unique (lst):
    lst.sort()
    to_remove = []
    for itm_i, itm_j in zip(lst[:-1], lst[1:]):
        if itm_i == itm_j:
            to_remove.append(itm_j)
    for itm in to_remove:
        lst.remove(itm)
    return lst

测试它：

>>> testit()

或使用列表理解：

def sort_and_unique (lst):
    lst.sort()
    to_remove = [itm_j for itm_i, itm_j in zip(lst[:-1], lst[1:]) if itm_i == itm_j]
    for itm in to_remove:
        lst.remove(itm)
    return lst

测试它：

>>> testit()

由于列表理解（使用[]）很快就会完成返回值的创建使用过，我们可以删除另一行：

def sort_and_unique (lst):
    lst.sort()
    for itm in [itm_j for itm_i, itm_j in zip(lst[:-1], lst[1:]) if itm_i == itm_j]:
        lst.remove(itm)
    return lst

测试它：

>>> testit()

请注意，到目前为止，代码仍然反映了您的原始算法，只删除了两个错误：

- not manipulating list, we are iterating over
- taking into account also last item from the list

消除列表中的重复元素

6 个答案:

解决方案解释：

一行版本：

在我们深入编码之前，制作测试用例

对您的代码的一般评论

通过索引

操作已处理的列表

如何重写您的代码