我在ASP.NET网站上使用Regex时遇到问题。我想从网站源获取两个正则表达式的计数,使用webclient下载源。
我想获得一个正则表达式的计数,在tmz.com网站上搜索。
基本上我有一个文本文件,每行都有关键字。我放了2位艺术家,它应该去每一行,制作像artist1 + regex + keyword + regex + artist2这样的正则表达式。想法是查看我的关键字和艺术家的计数(搜索次数)。
这是我的函数的源代码。
int counts = 0;
string line = "";
StreamReader read = new StreamReader(@"C:\words.txt");
WebClient web = new WebClient();
string content = web.DownloadString("http://www.tmz.com/search/news/" + artist1 + " " + artist2);
while ((line = read.ReadLine()) != null)
{
string pattern = artist1 + "[a-zA-Z0-9\\s]{1,10}" + line + "[a-zA-Z0-9\\s]{1,10}" + artist2;
MatchCollection matches = Regex.Matches(pattern, content);
counts += matches.Count;
string pattern2 = artist2 + "[a-zA-Z0-9\\s]{1,10}" + line + "[a-zA-Z0-9\\s]{1,10}" + artist1;
MatchCollection matches1 = Regex.Matches(pattern2, content);
counts += matches1.Count;
}
read.Close();
return counts;
但是我收到了这个错误: 描述:执行当前Web请求期间发生了未处理的异常。请查看堆栈跟踪以获取有关错误及其源自代码的详细信息。
我得到的红线(错误是这一行): 第54行:MatchCollection匹配= Regex.Matches(模式,内容);
确切的例外:
System.ArgumentException was unhandled by user code
Message=parsing "<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:meebo="http://www.meebo.com/" xmlns:og="http://ogp.me/ns#" xmlns:fb="http://ogp.me/ns/fb#">
<head>
<script type="text/javascript">var _sf_startpt=(new Date()).getTime()</script>
<title>Search bowwow tyra - news - Page 1 | TMZ.com </title>
<meta name="robots" content="all"/>
<meta name="description" content="Celebrity Gossip and Entertainment News, Covering Celebrity News and Hollywood Rumors. Get All The Latest Gossip at TMZ - Thirty Mile Zone" />
<meta name="generator" content="Crowd Fusion 2.0-enterprise" />
<!-- Site Verification -->
<meta name="google-site-verification" content="UUmtbUBf3djgPpCeLefe_PbFsOc6JGxfXmHzpjFLAEQ" />
<meta name="verify-v1" content="Wtpd0N6FufoE2XqopQJoTjWV6Co/Mny9BTaswPJbPPA=" />
<meta name="msvalidate.01" content="AFEB17971BCF30779AEA662782EF26F4" />
<meta name="y_..." - Too many )'s.
Source=System
StackTrace:
at System.Text.RegularExpressions.RegexParser.ScanRegex()
at System.Text.RegularExpressions.RegexParser.Parse(String re, RegexOptions op)
at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options, Boolean useCache)
at System.Text.RegularExpressions.Regex.Matches(String input, String pattern)
at _Default.tmz(String artist1, String artist2) in c:\Users\icebox19\Documents\Visual Studio 2010\WebSites\WebSite3\Default.aspx.cs:line 52
at _Default.Button1_Click(Object sender, EventArgs e) in c:\Users\icebox19\Documents\Visual Studio 2010\WebSites\WebSite3\Default.aspx.cs:line 89
at System.Web.UI.WebControls.Button.OnClick(EventArgs e)
at System.Web.UI.WebControls.Button.RaisePostBackEvent(String eventArgument)
at System.Web.UI.WebControls.Button.System.Web.UI.IPostBackEventHandler.RaisePostBackEvent(String eventArgument)
at System.Web.UI.Page.RaisePostBackEvent(IPostBackEventHandler sourceControl, String eventArgument)
at System.Web.UI.Page.RaisePostBackEvent(NameValueCollection postData)
at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)
InnerException:
PS:我使用的是ASP.NET网站,而不是c#独立应用程序,这是一个网页。
答案 0 :(得分:3)
这是因为你使用了错误的参数。它应该是Regex.Matches(INPUT,PATTERN)。