无法让团队脱离正则表达式

时间:2013-02-19 04:38:00

标签: c# regex

以下代码仅返回“干得好!”如何从中获取实际的URL?我按照给定的网站上的教程,我仍然有点麻烦缠绕它。另外,我认为这不是正则表达式的最佳方式(将正则表达式与html混合)。有没有一种基于它的CSS类捕获文本的简单方法?

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Net;
using System.IO;
using System.Text.RegularExpressions;



namespace Scraper
{
    class Program
    {
        static void Main(string[] args)
        {
            string target = @"http://www.omegacoder.com/?p=58";
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(target);
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();

            Regex URL  = new Regex("(?:href=)(?<link>.*?)");

            string line;
            using (Stream responseStream = response.GetResponseStream())
            using (StreamReader htmlStream = new StreamReader(responseStream))
                while ((line = htmlStream.ReadLine()) != null){

                    Match m = URL.Match(line);

            if (m.Success) {
                Console.WriteLine("Good job! " + URL.Match(line) + m.Groups[0].Value + m.Groups[1].Value + m.Groups["link"]);
                Console.ReadLine();
            } else {

            }

                }  
                /*    if (Regex.IsMatch(line, "XXXXX")) 
                            Console.WriteLine(line);
                } */
            Console.ReadLine();

        }
    }
}

1 个答案:

答案 0 :(得分:0)

您应该使用(?:href=)(?<link>\S*)

\S匹配非空格字符