我正面临着让我的正则表达式工作的困难。我试图只从字符串中导出网址。下面是字符串中的一些文本。 pastebin.com/wA9N1Gbi 。我试图使用的正则表达式是
(?< protocol>https?:\/\/)(?:(?< urlroot>[^\/?#\n\s]+))?(?< urlResource>[^?#\n\s]+)?(?< queryString>\?(?:[^#\n\s]*))?(?:#(?< fragment>[^\n\s]))?
这是一个链接 regex101.com/r/bH1eS9/3
不能正常工作,编译时出现以下错误“Historik.exe中0x7638DAE8处的未处理异常:Microsoft C ++异常:内存位置0x0018ED9C处的std :: regex_error。”。你们有没有人知道我怎么能这样做?是否有另一个正则表达式函数可能更适合此任务?
此刻编码即时通讯。提前致谢。
string str;
std::ifstream in("c:/Users/Petrus/Documents/History", std::ios::binary);
std::stringstream buffer;
buffer << in.rdbuf();
std::string contents(buffer.str())
unsigned counter = 0;
std::regex word_regex(
R"((?<protocol>https?:\/\/)(?:(?<urlroot>[^\/?#\n\s]+))?(?<urlResource>[^?#\n\s]+)?(?<queryString>\?(?:[^#\n\s]*))?(?:#(?<fragment>[^\n\s]))?)",
std::regex::extended
);
auto words_begin = std::sregex_iterator(contents.begin(), contents.end(), word_regex);
auto words_end = std::sregex_iterator();
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
std::smatch match = *i;
std::string match_str = match.str();
for (const auto& res : match) {
counter++;
std::cout << counter++ << ": " << res << std::endl;
}
答案 0 :(得分:0)
你需要这么复杂的正则表达式吗?你可以逃避一些不那么严格的事情吗?
std::string load_file(const std::string& filename)
{
std::ostringstream oss;
if(auto ifs = std::ifstream(filename, std::ios::binary))
oss << ifs.rdbuf();
else
throw std::runtime_error("Failed to open file: " + filename);
return oss.str();
}
int main(int, const char* const*)
{
std::string s = load_file("test.txt");
// crude... but effective?
std::regex e(R"(https?:\/\/[^/]+[[:print:][:punct:]]*)");
auto itr = std::sregex_iterator(s.begin(), s.end(), e);
auto end = std::sregex_iterator();
unsigned counter = 0;
for(; itr != end; ++itr)
std::cout << ++counter << ": " << itr->str(0) << '\n';
}
<强>输出:强>
1: http://boplats.vaxjo.se/
2: http://192.168.0.7/
3: http://old.honeynet.org/
4: http://old.honeynet.org/scans/scan15/som/som11.txt
5: http://en.hackdig.com/
6: http://parallelrecovery.com/pdf-password.html
7: http://digitalcorpora.org/corp
8: http://tv4play.se/program/nyhetsmorgon
9: http://bredbandskollen.se/
10: http://194.47.149.19/dv1482/Lab5/
...