Question

我有一个非常简约的代码，通过在列表中存储名称（接近1000）的历史数据，为用户设置的输入查询执行自动完成。现在，它以字典最小的顺序给出了建议。

存储在列表中的名称是（虚构的）：

queries = ["10", "greater", ">", "7 w"]

用户提供的查询可以是：

class Index(object):

    def __init__(self, words):
        index = {}
        for w in sorted(words, key=str.lower, reverse=True):
            lw = w.lower()
            for i in range(1, len(lw) + 1):
                index[lw[:i]] = w

        self.index = index

    def by_prefix(self, prefix):
        """Return lexicographically smallest word that starts with a given
        prefix.
        """ 
        return self.index.get(prefix.lower(), 'no matches found')

def typeahead(usernames, queries):
    users = Index(usernames)
    print "\n".join(users.by_prefix(q) for q in queries)

当前实施：

xquery version "1.0-ml"; 
declare namespace grp = "http://marklogic.com/xdmp/group";
declare namespace c = 'http://iddn.icis.com/ns/core';
import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy";

declare variable $task-file-path := "/var/tmp/Projects/update-malformed-ids-task.xqy";
declare variable $string-starting-with-new-line-pattern := "&#10;*";
declare variable $string-starting-with-space-pattern := "&nbsp;*";
declare variable $LIVE-ASSET-COLLECTION := "live-collection";
declare variable $batch-size := 100;

declare function local:is-migration-needed()
{
  (
    fn:exists(cts:element-value-match(xs:QName("c:id"), $string-starting-with-space-pattern, (), cts:collection-query($LIVE-ASSET-COLLECTION))) or
      fn:exists(cts:element-value-match(xs:QName("c:id"), $string-starting-with-new-line-pattern, (), cts:collection-query($LIVE-ASSET-COLLECTION)))
  )
};
declare function local:migrate()
{
  if(local:is-migration-needed())
  then (: do task here :)
  else ()
}

local:migrate()

如果查询以预先存储的名称开头，则此方法可以正常工作。但是如果进行了随机输入，则无法提供建议（从字符串中间的某处查询）。它也不会识别数字，也不会失败。

我想知道是否有办法包含上述功能以改善我现有的实施。

非常感谢任何帮助。

Answer 1

它的O（n）但是它有效。您的函数正在检查它是否以前缀开头，但您描述的行为是检查字符串是否包含查询

def __init__(self, words):
    self.index = sorted(words, key=str.lower, reverse=True)

def by_prefix(self, prefix):
    for item in self.index:
        if prefix in item:
            return item

这给出了：

top 10 places to visit
Cost greater than 100
Population > 1000
show me 7 wonders of the world

只是为了记录，我的电脑需要0.175秒才能获得1,000,005条记录的5次查询，最后5条记录是匹配的记录。（最坏情况）

Answer 2

如果您不关心效果，可以对列表if prefix in item:中的每个item使用names。如果prefix是字符串项的一部分，则此语句匹配，例如：

prefix   item       match
'foo'    'foobar'   True
'bar'    'foobar'   True
'ob'     'foobar'   True
...

我认为这是实现这一目标的最简单方法，但显然不是最快的。

Answer 3

另一种选择是在索引中添加更多条目，例如对于项目"most beautiful places"：

"most beautiful places"
"beautiful places"
"places"

如果你这样做，如果你开始输入一个不是句子中第一个单词的单词，你也会得到匹配。您可以像这样修改代码来执行此操作：

class Index(object):

    def __init__(self, words):
        index = {}
        for w in sorted(words, key=str.lower, reverse=True):
            lw = w.lower()
            tokens = lw.split(' ')
            for j in range(len(tokens)):
                w_part = ' '.join(tokens[j:])
                for i in range(1, len(w_part) + 1):
                    index[w_part[:i]] = w

        self.index = index

这种方法的缺点是索引变得非常大。您还可以将此方法与Keatinge指出的方法结合使用，为索引字典中的每个单词存储2位数的前缀，并将包含此前缀的查询列表存储为索引字典的项目。

Python - 使用随机查询自动完成数字扩展和建议

3 个答案: