正则表达式不是vaild

时间:2016-05-17 16:45:52

标签: c++ regex url

我正面临着让我的正则表达式工作的困难。我试图只从字符串中导出网址。下面是字符串中的一些文本。 pastebin.com/wA9N1Gbi 。我试图使用的正则表达式是

(?< protocol>https?:\/\/)(?:(?< urlroot>[^\/?#\n\s]+))?(?< urlResource>[^?#\n\s]+)?(?< queryString>\?(?:[^#\n\s]*))?(?:#(?< fragment>[^\n\s]))?

这是一个链接 regex101.com/r/bH1eS9/3

不能正常工作,编译时出现以下错误“Historik.exe中0x7638DAE8处的未处理异常:Microsoft C ++异常:内存位置0x0018ED9C处的std :: regex_error。”。你们有没有人知道我怎么能这样做?是否有另一个正则表达式函数可能更适合此任务?

此刻编码即时通讯。提前致谢。

string str;
std::ifstream in("c:/Users/Petrus/Documents/History", std::ios::binary);
std::stringstream buffer;

buffer << in.rdbuf();

std::string contents(buffer.str())

unsigned counter = 0;
std::regex word_regex(
    R"((?<protocol>https?:\/\/)(?:(?<urlroot>[^\/?#\n\s]+))?(?<urlResource>[^?#\n\s]+)?(?<queryString>\?(?:[^#\n\s]*))?(?:#(?<fragment>[^\n\s]))?)",
    std::regex::extended
    );
auto words_begin = std::sregex_iterator(contents.begin(), contents.end(), word_regex);
auto words_end = std::sregex_iterator();

for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
    std::smatch match = *i;
    std::string match_str = match.str();
    for (const auto& res : match) {
        counter++;
        std::cout << counter++ << ": " << res << std::endl;
    }

1 个答案:

答案 0 :(得分:0)

你需要这么复杂的正则表达式吗?你可以逃避一些不那么严格的事情吗?

std::string load_file(const std::string& filename)
{
    std::ostringstream oss;
    if(auto ifs = std::ifstream(filename, std::ios::binary))
        oss << ifs.rdbuf();
    else
        throw std::runtime_error("Failed to open file: " + filename);
    return oss.str();
}

int main(int, const char* const*)
{
    std::string s = load_file("test.txt");

    // crude... but effective?
    std::regex e(R"(https?:\/\/[^/]+[[:print:][:punct:]]*)");

    auto itr = std::sregex_iterator(s.begin(), s.end(), e);
    auto end = std::sregex_iterator();

    unsigned counter = 0;
    for(; itr != end; ++itr)
        std::cout << ++counter << ": " << itr->str(0) << '\n';

}

<强>输出:

1: http://boplats.vaxjo.se/
2: http://192.168.0.7/
3: http://old.honeynet.org/
4: http://old.honeynet.org/scans/scan15/som/som11.txt
5: http://en.hackdig.com/
6: http://parallelrecovery.com/pdf-password.html
7: http://digitalcorpora.org/corp
8: http://tv4play.se/program/nyhetsmorgon
9: http://bredbandskollen.se/
10: http://194.47.149.19/dv1482/Lab5/
...