如何删除emacs中的重复行

时间:2012-10-24 09:53:13

标签: emacs elisp

我有一个有很多行的文字,我的问题是如何删除emacs中的重复行?在没有外部工具的emacs或elisp包中使用该命令。

例如:

this is line a
this is line b
this is line a

删除第3行(与第1行相同)

this is line a
this is line b

5 个答案:

答案 0 :(得分:27)

如果您使用的是Emacs 24.4或更新版本,最简单的方法是使用新的delete-duplicate-lines功能。注意

  • 这适用于某个区域,而不是缓冲区,因此请先选择所需的文字
  • 它保持原件的相对顺序,杀死重复件

例如,如果您的输入是

test
dup
dup
one
two
one
three
one
test
five

M-x delete-duplicate-lines会成功

test
dup
one
two
three
five

您可以选择通过在其前面加上通用参数(C-u)来向后搜索。结果将是

dup
two
three
one
test
five

信用转到emacsredux.com

通过Eshell提供的其他环形交叉选项,但没有给出完全相同的结果:

  1. sort -u;不保持原件的相对顺序
  2. uniq;更糟糕的是它需要对其输入进行排序

答案 1 :(得分:16)

将此代码放入.emacs:

(defun uniq-lines (beg end)
  "Unique lines in region.
Called from a program, there are two arguments:
BEG and END (region to sort)."
  (interactive "r")
  (save-excursion
    (save-restriction
      (narrow-to-region beg end)
      (goto-char (point-min))
      (while (not (eobp))
        (kill-line 1)
        (yank)
        (let ((next-line (point)))
          (while
              (re-search-forward
               (format "^%s" (regexp-quote (car kill-ring))) nil t)
            (replace-match "" nil nil))
          (goto-char next-line))))))

用法:

M-x uniq-lines

答案 2 :(得分:7)

在linux中,选择区域,然后输入

M-| uniq <RETURN>

没有重复的结果在新缓冲区中。

答案 3 :(得分:2)

(defun unique-lines (start end)
  "This will remove all duplicating lines in the region.
Note empty lines count as duplicates of the empy line! All empy lines are 
removed sans the first one, which may be confusing!"
  (interactive "r")
  (let ((hash (make-hash-table :test #'equal)) (i -1))
    (dolist (s (split-string (buffer-substring-no-properties start end) "$" t)
               (let ((lines (make-vector (1+ i) nil)))
                 (maphash 
                  (lambda (key value) (setf (aref lines value) key))
                  hash)
                 (kill-region start end)
                 (insert (mapconcat #'identity lines "\n"))))
      (setq s                           ; because Emacs can't properly
                                        ; split lines :/
            (substring 
             s (position-if
                (lambda (x)
                  (not (or (char-equal ?\n x) (char-equal ?\r x)))) s)))
      (unless (gethash s hash)
        (setf (gethash s hash) (incf i))))))

替代方案:

  • 不会使用撤消历史记录来存储匹配项。
  • 一般来说会更快(但如果你追求极限速度 - 建立一个前缀树)。
  • 具有替换所有以前的换行符的效果,无论它们是\n(UNIX样式)。根据您的情况,这可能是奖金或劣势。
  • 如果以一种接受字符而不是正则表达式的方式重新实现split-string,你可以使它更好(更快)。

稍长一些,但也许是一种更有效的变体:

(defun split-string-chars (string chars &optional omit-nulls)
  (let ((separators (make-hash-table))
        (last 0)
        current
        result)
    (dolist (c chars) (setf (gethash c separators) t))
    (dotimes (i (length string)
                (progn
                 (when (< last i)
                   (push (substring string last i) result))
                 (reverse result)))
      (setq current (aref string i))
      (when (gethash current separators)
        (when (or (and (not omit-nulls) (= (1+ last) i))
                  (/= last i))
          (push (substring string last i) result))
        (setq last (1+ i))))))

(defun unique-lines (start end)
  "This will remove all duplicating lines in the region.
Note empty lines count as duplicates of the empy line! All empy lines are 
removed sans the first one, which may be confusing!"
  (interactive "r")
  (let ((hash (make-hash-table :test #'equal)) (i -1))
    (dolist (s (split-string-chars
                (buffer-substring-no-properties start end) '(?\n) t)
               (let ((lines (make-vector (1+ i) nil)))
                 (maphash 
                  (lambda (key value) (setf (aref lines value) key))
                  hash)
                 (kill-region start end)
                 (insert (mapconcat #'identity lines "\n"))))
      (unless (gethash s hash)
        (setf (gethash s hash) (incf i))))))

答案 4 :(得分:0)

另一种方式:

  1. 选择文本区域。
  2. Ctrl-U(前缀),M- | (区域上的shell命令),对-u(在选区上运行并将其替换为选区的命令)进行排序。