如何在Notepad ++中删除重复的WORDS?

时间:2016-03-13 15:25:31

标签: duplicates notepad++

我有一个大文本文件,如下所示:

Mitchel-2
Anna-2
Witold-4
Serena-3
Serena-9
Witros-3

我需要在“ - ”之前的第一个字永远不会重复。除了第一个以外的任何方式删除所有。所以,如果我喜欢以“Serena”开头的3000行,但是在“ - ”之后总是有不同的数字,有没有办法删除2999行Serena而只留下第一行?

Serena也只是一个例子,我有200多个其他单词可以复制。

1 个答案:

答案 0 :(得分:0)

我认为你不能用notepad ++来做到这一点。你可以为每个名字使用正则表达式,但由于你有超过200,这是不切实际的。

但是你可以编写一个为你做的程序。基本上你要经历两个步骤:

1)搜索每个唯一名称并将其保存在一个集合中(不允许重复输入)。 2)对于集合中的每个唯一名称,您将在文件中搜索重复项。

我写了一个简单的c ++程序,它在字符串变量上找到重复项。您可以根据自己的喜好调整它。我用 Microsoft Visual Studio Community 2015 编译它(它在cpp.sh中不起作用)

#include "stdafx.h"
#include <regex>
#include <string>
#include <iostream>
#include <set>

using namespace std;

int main()
{

    typedef match_results<const char*> cmatch;
    set<string> names;

    string notepad_text = "Serena-1\nSerena-2\nSerena-3\nSerena-4\nAna-1\nSerena-7\nWilson-1\nAna-2\nJohn-1\nAna-3\nJohn-2\nWilson-2";
    regex regex_find_names("^\\w+"); //double slashes are needed because this is in a string

    // 1) Let's find every name

    //sregex_iterator it_beg(notepad_text.begin(), notepad_text.end(), regex_find_names);
    sregex_iterator find_names_itit(notepad_text.begin(), notepad_text.end(), regex_find_names);
    sregex_iterator it_end; //defaults to the end condition

    while (find_names_itit != it_end) {
        names.insert(find_names_itit->str()); //automatically deletes duplicates
        ++find_names_itit;
    }

    // 2) For demonstration purposes, let's print what we've found

    cout << "---printing the names we've found:\n\n";
    set<string>::const_iterator names_it; // declare an iterator
    names_it = names.begin();             // assign it to the start of the set
    while (names_it != names.end())       // while it hasn't reach the end
    {
        cout << *names_it << " ";
        ++names_it; 
    }

    // 3) Let's find the duplicates

    cout << "\n\n---printing the regex matches:\n";

    string current_name;
    set<string>::const_iterator current_name_it; //this iterates over every name we've found
    current_name_it = names.begin();
    while (current_name_it != names.end())
    {
        // we're building something like "^Serena.*"
        current_name = "^"; 
        current_name += *current_name_it; 
        current_name += ".*"; 
        cout << "\n-Lets find duplicates of: " << *current_name_it << endl;
        ++current_name_it;

        // let's iterate through the matches
        regex regex_obj(current_name); //double slashes are needed because this is in a string
        sregex_iterator it_beg(notepad_text.begin(), notepad_text.end(), regex_obj);
        sregex_iterator it(notepad_text.begin(), notepad_text.end(), regex_obj); //this iterates over the match results
        sregex_iterator it_end;
        //string res = *it;

        while (it != it_end) {
            if (it != it_beg)
            {
                cout << it->str() << endl;
            }
            ++it;

        }

    }


    int i; //depending on the compaling getting this additional char is necessary to see the console window
    cin >> i;
    return 0;
}

输入字符串是:

Serena-1
Serena-2
Serena-3
Serena-4
Ana-1
Serena-5
Wilson-1
Ana-2
John-1
Ana-3
John-2
Wilson-2

这里打印

---printing the names we've found:

Ana John Serena Wilson

---printing the regex matches:

-Lets find duplicates of: Ana
Ana-2
Ana-3

-Lets find duplicates of: John
John-2

-Lets find duplicates of: Serena
Serena-2
Serena-3
Serena-4
Serena-5

-Lets find duplicates of: Wilson
Wilson-2