Question

我想在emacs中实现vim commandT plugin。此代码主要是来自matcher的翻译。

我在这里有一些elisp，在我的上网本上使用起来仍然太慢 - 我怎样才能加快速度呢？

(eval-when-compile (require 'cl))
(defun commandT-fuzzy-match (choices search-string)
  (sort (loop for choice in choices
              for score = (commandT-fuzzy-score choice search-string (commandT-max-score-per-char choice search-string))
              if (> score 0.0) collect (list score choice))
        #'(lambda (a b) (> (first a) (first b)))
        ))

(defun* commandT-fuzzy-score (choice search-string &optional (score-per-char (commandT-max-score-per-char choice search-string)) (choice-pointer 0) (last-found nil))
  (condition-case error
      (loop for search-char across search-string
            sum (loop until (char-equal search-char (elt choice choice-pointer))
                      do (incf choice-pointer)
                      finally return (let ((factor (cond (last-found (* 0.75 (/ 1.0 (- choice-pointer last-found))))
                                                         (t 1.0))))
                                       (setq last-found choice-pointer)
                                       (max (commandT-fuzzy-score choice search-string score-per-char (1+ choice-pointer) last-found)
                                            (* factor score-per-char)))))
    (args-out-of-range 0.0)   ; end of string hit without match found.
    ))

(defun commandT-max-score-per-char (choice search-string)
  (/ (+ (/ 1.0 (length choice)) (/ 1.0 (length search-string))) 2))

请务必编译该部分，因为这已经有很多帮助。一个基准：

(let ((choices (split-string (shell-command-to-string "curl http://sprunge.us/FcEL") "\n")))
  (benchmark-run-compiled 10
      (commandT-fuzzy-match choices "az")))

Answer 1

您可以尝试以下微观优化：

使用car-less-than-car代替lambda表达式。这没有明显效果，因为时间不会花在sort上，而是花费在commandT-fuzzy-score。
使用defun代替defun*：具有非零默认值的可选参数具有不可忽略的隐藏成本。这样可以将GC成本降低近一半（并且开始时GC的使用时间超过了10％）。
（* 0.75（/ 1.0 XXX））等于（/ 0.75 XXX）。
使用eq代替char-equal（将行为更改为始终区分大小写，因此）。这会产生相当大的差异。
使用aref代替elt。
我不明白为什么你在递归调用中传递last-found，所以我显然不完全理解你的算法在做什么。但假设这是一个错误，您可以将其转换为局部变量，而不是将其作为参数传递。这可以节省您的时间。
我不明白你为什么要对你找到的每个search-char进行递归调用，而不只是针对第一个调用。另一种看待这种情况的方法是你的max将“单一字符得分”与“整个搜索字符串得分”进行比较，这似乎很奇怪。如果您使用max上的递归调用更改代码以执行两个loop之外的(1+ first-found)，则会在我的测试用例中将其加速4倍。
score-per-char的乘法可以移到循环之外（原始算法似乎不是这样）。

此外，在Emacs中实现的Elisp非常慢，因此您通常最好使用“大型原语”，以便花费更少的时间来解释Elisp（字节）代码和更多时间运行C代码。这是一个替代实现（不是你原来的算法，而是我在循环之外移动max之后得到的算法），使用正则表达式模式加工来做内循环：

(defun commandT-fuzzy-match-re (choices search-string)
  (let ((search-re (regexp-quote (substring search-string 0 1)))
        (i 1))
    (while (< i (length search-string))
      (setq search-re (concat search-re
                              (let ((c (aref search-string i)))
                                (format "[^%c]*\\(%s\\)"
                                        c (regexp-quote (string c))))))
      (setq i (1+ i)))

    (sort
     (delq nil
           (mapcar (lambda (choice)
                     (let ((start 0)
                           (best 0.0))
                       (while (string-match search-re choice start)
                         (let ((last-found (match-beginning 0)))
                           (setq start (1+ last-found))
                           (let ((score 1.0)
                                 (i 1)
                                 (choice-pointer nil))
                             (while (setq choice-pointer (match-beginning i))
                               (setq i (1+ i))
                               (setq score (+ score (/ 0.75 (- choice-pointer last-found))))
                               (setq last-found choice-pointer))
                             (setq best (max best score)))))
                       (when (> best 0.0)
                         (list (* (commandT-max-score-per-char
                                   choice search-string)
                                  best)
                               choice))))
                   choices))
     #'car-less-than-car)))

加速emacs中的字符串匹配

1 个答案: