Question

我是python的新手，并且已经编写了一个练习程序来检查一个包含400万字的列表的密码。我原来的解决方案是这样的（如果密码包含在列表中，将打印出来）：

import sys
from bisect import bisect_left 

script, password, pwlist = sys.argv
password = password+"\r\n"

l=[line for line in open(pwlist)] 
l.sort() #Must be sorted for bisect_left to work                         

print (password <= l[-1]) and (l[bisect_left(l, password)] == password)

然后我意识到我可以使用索引方法，如下所示：

import sys

script, password, pwlist = sys.argv
password = password+"\r\n"

l=[line for line in open(pwlist)] #Note we don't need to sort this time

#Catch the "not in list" exception
try:
    print (password <= l[-1]) and (l[l.index(password)] == password)
except ValueError:
    print "False"

我的第二个版本大大减少了执行时间，因为列表不需要排序。我是否以正确的方式接近了这个？ index（）方法如何工作？当然，如果它适用于未排序的列表，它不会进行二进制搜索。对此有任何建议将不胜感激。

Answer 1

是的，在第一个例子中，您首先要设计一个算法，即二进制搜索。

在第二个示例中，您只需使用python内置list.index()函数。

第二种方式更快，因为排序列表的成本：O(N*log(N)) 更大比线性搜索数组的成本更高<{1}}。

考虑一下：如果你必须检查多个密码，那么排序和存储排序列表一次就好了，然后在排序后的列表上使用二进制搜索。

Answer 2

当数据结构已经排序时，使用二进制搜索会更好，因为您在O（日志N）中获取它。当你对列表进行排序时，你在O（N * log N）中进行排序，它比线性搜索O（N）

慢

Answer 3

list.index方法的复杂性在最坏的情况下是O（N），基于其Cpython function它的优化函数，它返回列表中第一个匹配项的索引。所以这将是最好的方式，并注意二进制搜索在处理排序列表时非常好。

listindex(PyListObject *self, PyObject *args)
{
    Py_ssize_t i, start=0, stop=Py_SIZE(self);
    PyObject *v;

    if (!PyArg_ParseTuple(args, "O|O&O&:index", &v,
                                _PyEval_SliceIndex, &start,
                                _PyEval_SliceIndex, &stop))
        return NULL;
    if (start < 0) {
        start += Py_SIZE(self);
        if (start < 0)
            start = 0;
    }
    if (stop < 0) {
        stop += Py_SIZE(self);
        if (stop < 0)
            stop = 0;
    }
    for (i = start; i < stop && i < Py_SIZE(self); i++) {
        int cmp = PyObject_RichCompareBool(self->ob_item[i], v, Py_EQ);
        if (cmp > 0)
            return PyLong_FromSsize_t(i);
        else if (cmp < 0)
            return NULL;
    }
    PyErr_Format(PyExc_ValueError, "%R is not in list", v);
    return NULL;
}

但是在你的第一个代码中你做了很多额外的工作。

首先，您不需要使用列表解析来获取文件的所有行，而您只需使用file.readlines()方法。您也有一个排序方法，这使您的冷杉接近比第二个慢得多。

P.S如果您只想将成员资格检查为更加pythonic的方式，您可以使用set对象来保留您的项目，并使用in操作数，其顺序为O（1）。

查找列表中元素的索引。二进制搜索还是使用索引函数？

3 个答案: