我有一个大文本文件,如下所示:
Mitchel-2
Anna-2
Witold-4
Serena-3
Serena-9
Witros-3
我需要在“ - ”之前的第一个字永远不会重复。除了第一个以外的任何方式删除所有。所以,如果我喜欢以“Serena”开头的3000行,但是在“ - ”之后总是有不同的数字,有没有办法删除2999行Serena而只留下第一行?
Serena也只是一个例子,我有200多个其他单词可以复制。
答案 0 :(得分:0)
我认为你不能用notepad ++来做到这一点。你可以为每个名字使用正则表达式,但由于你有超过200,这是不切实际的。
但是你可以编写一个为你做的程序。基本上你要经历两个步骤:
1)搜索每个唯一名称并将其保存在一个集合中(不允许重复输入)。 2)对于集合中的每个唯一名称,您将在文件中搜索重复项。
我写了一个简单的c ++程序,它在字符串变量上找到重复项。您可以根据自己的喜好调整它。我用 Microsoft Visual Studio Community 2015 编译它(它在cpp.sh中不起作用)
#include "stdafx.h"
#include <regex>
#include <string>
#include <iostream>
#include <set>
using namespace std;
int main()
{
typedef match_results<const char*> cmatch;
set<string> names;
string notepad_text = "Serena-1\nSerena-2\nSerena-3\nSerena-4\nAna-1\nSerena-7\nWilson-1\nAna-2\nJohn-1\nAna-3\nJohn-2\nWilson-2";
regex regex_find_names("^\\w+"); //double slashes are needed because this is in a string
// 1) Let's find every name
//sregex_iterator it_beg(notepad_text.begin(), notepad_text.end(), regex_find_names);
sregex_iterator find_names_itit(notepad_text.begin(), notepad_text.end(), regex_find_names);
sregex_iterator it_end; //defaults to the end condition
while (find_names_itit != it_end) {
names.insert(find_names_itit->str()); //automatically deletes duplicates
++find_names_itit;
}
// 2) For demonstration purposes, let's print what we've found
cout << "---printing the names we've found:\n\n";
set<string>::const_iterator names_it; // declare an iterator
names_it = names.begin(); // assign it to the start of the set
while (names_it != names.end()) // while it hasn't reach the end
{
cout << *names_it << " ";
++names_it;
}
// 3) Let's find the duplicates
cout << "\n\n---printing the regex matches:\n";
string current_name;
set<string>::const_iterator current_name_it; //this iterates over every name we've found
current_name_it = names.begin();
while (current_name_it != names.end())
{
// we're building something like "^Serena.*"
current_name = "^";
current_name += *current_name_it;
current_name += ".*";
cout << "\n-Lets find duplicates of: " << *current_name_it << endl;
++current_name_it;
// let's iterate through the matches
regex regex_obj(current_name); //double slashes are needed because this is in a string
sregex_iterator it_beg(notepad_text.begin(), notepad_text.end(), regex_obj);
sregex_iterator it(notepad_text.begin(), notepad_text.end(), regex_obj); //this iterates over the match results
sregex_iterator it_end;
//string res = *it;
while (it != it_end) {
if (it != it_beg)
{
cout << it->str() << endl;
}
++it;
}
}
int i; //depending on the compaling getting this additional char is necessary to see the console window
cin >> i;
return 0;
}
输入字符串是:
Serena-1
Serena-2
Serena-3
Serena-4
Ana-1
Serena-5
Wilson-1
Ana-2
John-1
Ana-3
John-2
Wilson-2
这里打印
---printing the names we've found:
Ana John Serena Wilson
---printing the regex matches:
-Lets find duplicates of: Ana
Ana-2
Ana-3
-Lets find duplicates of: John
John-2
-Lets find duplicates of: Serena
Serena-2
Serena-3
Serena-4
Serena-5
-Lets find duplicates of: Wilson
Wilson-2