Question

我一直试图查看是否可以获取学校网站的时间表数据，并对其进行一些应用。目前这就是我所拥有的：

string userInput = "/*My username will be here*/";
string passInput = "/*My password will be here */";

string formUrl = "https://portal.gc.ac.nz/student/index.php/process-login";
string formParams = string.Format("username={0}&password={1}", userInput, passInput);
string cookieHeader;

WebRequest req = WebRequest.Create(formUrl);
req.ContentType = "application/x-www-form-urlencoded";
req.Method = "POST";
byte[] bytes = Encoding.ASCII.GetBytes(formParams);
req.ContentLength = bytes.Length;
using (Stream os = req.GetRequestStream())
{
    os.Write(bytes, 0, bytes.Length);
}
WebResponse resp = req.GetResponse();
cookieHeader = resp.Headers["Set-cookie"];

string pageSource;
string getUrl = "https://portal.gc.ac.nz/student/index.php/timetable";
WebRequest getRequest = WebRequest.Create(getUrl);
getRequest.Headers.Add("Cookie", cookieHeader);
WebResponse getResponse = getRequest.GetResponse();
using (StreamReader sr = new StreamReader(getResponse.GetResponseStream()))
{
    pageSource = sr.ReadToEnd();
}

我无法找到检查上述代码是否有效的方法，但我的问题是：

如何从页面访问所需的数据（文本）？我想得到主题名称。部分HTML如下所示：

Answer 1

有几种方法可以做到这一点：一种是正则表达式匹配并获取标签的内容，另一种方法是使用HtmlAgilityPack库。

如果您不需要在C＃中执行此操作，我强烈建议使用其他语言，如Python或Perl。在我看来，你正试图抓取数据，在这种情况下，我强烈建议尽可能使用Python的Scrapy框架。这是我在抓取过程中遇到的最好的工具，您可以使用XPath轻松获取数据。这是link to Scrapy's website。

C＃读取数据关闭html

1 个答案: