Question

以下if statement的大O是什么？

if "pl" in "apple":
   ...

如果字符串“pl”在字符串“apple”中找到，python如何确定的整体大O是什么

或字符串搜索中的任何其他子字符串。

这是测试子字符串是否在字符串中的最有效方法吗？它是否使用与.find()相同的算法？

Answer 1

时间复杂度平均为O（N），O（NM）最差情况（N为较长字符串的长度，M，您搜索的较短字符串）。

相同的算法用于str.index()，str.find()，str.__contains__()（in运算符）和str.replace();它是对Boyer-Moore的简化，其中包含Boyer–Moore–Horspool和Sunday算法的提示。

请参阅original stringlib discussion post以及fastsearch.h source code;自introduction in Python 2.5以来基础算法没有改变（除了some low-level optimisations and corner-case fixes）。

该帖子包含算法的Python代码大纲：

def find(s, p):
    # find first occurrence of p in s
    n = len(s)
    m = len(p)
    skip = delta1(p)[p[m-1]]
    i = 0
    while i <= n-m:
        if s[i+m-1] == p[m-1]: # (boyer-moore)
            # potential match
            if s[i:i+m-1] == p[:m-1]:
                return i
            if s[i+m] not in p:
                i = i + m + 1 # (sunday)
            else:
                i = i + skip # (horspool)
        else:
            # skip
            if s[i+m] not in p:
                i = i + m + 1 # (sunday)
            else:
                i = i + 1
    return -1 # not found

以及速度比较。

Answer 2

在python 3.4.2中，看起来他们正在使用相同的功能，但是时间可能会有所不同。例如，s.find首先需要查找字符串的find方法等。

使用的算法是Boyer-More和Horspool之间的混合。

Answer 3

您可以使用timeit并自行测试：

maroun@DQHCPY1:~$ python -m timeit 's = "apple";s.find("pl")'
10000000 loops, best of 3: 0.125 usec per loop
maroun@DQHCPY1:~$ python -m timeit 's = "apple";"pl" in s'
10000000 loops, best of 3: 0.0371 usec per loop

使用in确实更快（0.0371 usec与0.125 usec相比）。

对于实际实施，您可以查看code itself。

Answer 4

我认为最好的方法是查看来源。 This似乎会实现__contains__：

static int
bytes_contains(PyObject *self, PyObject *arg)
{
    Py_ssize_t ival = PyNumber_AsSsize_t(arg, PyExc_ValueError);
    if (ival == -1 && PyErr_Occurred()) {
        Py_buffer varg;
        Py_ssize_t pos;
        PyErr_Clear();
        if (PyObject_GetBuffer(arg, &varg, PyBUF_SIMPLE) != 0)
            return -1;
        pos = stringlib_find(PyBytes_AS_STRING(self), Py_SIZE(self),
                             varg.buf, varg.len, 0);
        PyBuffer_Release(&varg);
        return pos >= 0;
    }
    if (ival < 0 || ival >= 256) {
        PyErr_SetString(PyExc_ValueError, "byte must be in range(0, 256)");
        return -1;
    }

    return memchr(PyBytes_AS_STRING(self), (int) ival, Py_SIZE(self)) != NULL;
}

就stringlib_find()而言，使用fastsearch()。

python的运行时if字符串中的子字符串

4 个答案: