在阅读了这个问题的答案之后:C# regex pattern to extract urls from given string - not full html urls but bare links as well我想知道哪种方法是从文档中提取网址的最快方法,使用正则表达式匹配或使用字符串拆分方法。
因此,您有一个包含html文档的字符串,并且想要提取网址。
正则表达式的方式是:
Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
foreach(Match m in linkParser.Matches(rawString))
MessageBox.Show(m.Value);
字符串拆分方法:
string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
foreach (string s in links)
MessageBox.Show(s);
哪一种是最有效的方法?
答案 0 :(得分:0)
拆分更快。以下是一些可以测试的代码: dotnetfiddle link
using System;
using System.Diagnostics;
using System.Linq;
using System.Text.RegularExpressions;
public class Program
{
public void Main()
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i=0; i < 500; i++)
{
Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
}
sw.Stop();
var test1Time = sw.ElapsedMilliseconds;
sw.Reset();
sw.Start();
for (int i=0; i < 500; i++)
{
string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
}
sw.Stop();
var test2Time = sw.ElapsedMilliseconds;
Console.WriteLine("Regex Test: " + test1Time.ToString());
Console.WriteLine("Split Test: " + test2Time.ToString());
}
}