当您使用Regex搜索某些内容时,我正试图抓取Google在第一页上生成的10个网站的链接。我对Regex很陌生,并且在使用它时遇到了很多麻烦:
MatchCollection links = Regex.Matches(indexPage, @"<h3 class=""r""><a href=""\s*(.+?)\s*"" class=l", RegexOptions.Multiline);
一旦我在集合中有链接,我就将它们添加到列表中:
foreach (Match link in links) {
string result = link.Groups[1].Value;
results.Add(result);
}
没有找到任何链接,任何帮助都会非常感谢
答案 0 :(得分:1)
找到所有网址:
"#^((?#
the scheme:
)(?:https?://)(?#
second level domains and beyond:
)(?:[\S]+\.)+((?#
top level domains:
)MUSEUM|TRAVEL|AERO|ARPA|ASIA|EDU|GOV|MIL|MOBI|(?#
)COOP|INFO|NAME|BIZ|CAT|COM|INT|JOBS|NET|ORG|PRO|TEL|(?#
)A[CDEFGILMNOQRSTUWXZ]|B[ABDEFGHIJLMNORSTVWYZ]|(?#
)C[ACDFGHIKLMNORUVXYZ]|D[EJKMOZ]|(?#
)E[CEGHRSTU]|F[IJKMOR]|G[ABDEFGHILMNPQRSTUWY]|(?#
)H[KMNRTU]|I[DELMNOQRST]|J[EMOP]|(?#
)K[EGHIMNPRWYZ]|L[ABCIKRSTUVY]|M[ACDEFGHKLMNOPQRSTUVWXYZ]|(?#
)N[ACEFGILOPRUZ]|OM|P[AEFGHKLMNRSTWY]|QA|R[EOSUW]|(?#
)S[ABCDEGHIJKLMNORTUVYZ]|T[CDFGHJKLMNOPRTVWZ]|(?#
)U[AGKMSYZ]|V[ACEGINU]|W[FS]|Y[ETU]|Z[AMW])(?#
the path, can be there or not:
)(/[a-z0-9\._/~%\-\+&\#\?!=\(\)@]*)?)$#i"