Question

有很多方法可以删除重复的行，但我想只留下唯一的行，并删除所有重复的行。

从这样的事情：

Duplicate
Duplicate
Important text
Other duplicate
Important text1
Other duplicate

要得到这个：

Important text
Important text1

我需要移除数千条线，并且所有这些重复的线条中只有10-20条线混合。

Answer 1

我认为正则表达式可以提供帮助，你可以先用这样的东西识别重复的行：

^(.+)$(?=[\s\S]*^(\1)$[\s\S]*)

DEMO

然后删除文本中匹配片段的每个出现。但是我认为Notepad ++没有这样的功能。

此正则表达式仅匹配第一次出现，并将在组中捕获第二次出现。但正则表达式不能匹配不连续的文本

Java中的示例：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test{
    public static void main(String[] args){
        String test = "Duplicate\n" +
                "Duplicate\n" +
                "Important text\n" +
                "Other duplicate\n" +
                "Important text1\n" +
                "Other duplicate";
        String result = test;
        Matcher matcher = Pattern.compile("^(.+)$(?=[\\s\\S]*^(\\1)$[\\s\\S]*)",Pattern.MULTILINE).matcher(test);
        while(matcher.find()){
            result = result.replaceAll(matcher.group(),"");
        }
        System.out.println(result);
    }
}

结果：

重要文字

重要文字1

但是如果你在Notepad ++中使用replaceAll()这个正则表达式，它应该只留下一个给定行的出现。

Answer 2

如果您使用的是unix系统并且这些行在文件中，那么您可以打开终端并执行

$ sort -u file.txt > uniqelines.txt

Answer 3

尝试使用：

找到：^(.+)\R([\s\S]*?)\1$
与：$2

重新对抗

请务必检查Regular Expression，Case sensitive但不是. matches newline

删除/删除所有重复的行

3 个答案: