Question

我正在尝试从此链接获取html页面，并使用HAP类库将内容存储到C＃中的特定文件中。我喜欢HtmlWeb类的Get方法。它编译并运行完美，但“file.txt”永远不会创建。这是班级及其客户。任何人都可以帮助：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;

namespace WebCrawler
{
    class Crawler
    {
        public Crawler() { }

        public Crawler(string Url)
        {
            this.Url = Url;
            HtmlWeb page = new HtmlWeb();
            Console.WriteLine(Url);
            HtmlDocument doc = page.Load(Url);
            page.Get(Url, "file.txt");
        }

        public string Url
        {
            get;
            set;
        }
    }
}


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;

namespace WebCrawler
{
    class Program
    {
        static void Main(string[] args)
        {
            Crawler crawler = new Crawler("https://code.google.com/p/abot/");
        }
    }
}


    using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;

namespace WebCrawler
{
    class Program
    {
        static void Main(string[] args)
        {
            Crawler crawler = new Crawler("https://code.google.com/p/abot/");
        }
    }
}

由于

Answer 1

为什么不做这样的事呢

System.IO.File.WriteAllText(@"c:\file.txt", doc.DocumentNode.OuterHtml);

Answer 2

您必须为HtmlDocument类型的对象调用Save方法。以下是加载Google网站索引页面并将其保存到out.html文件的示例。

const string url = "http://google.com";

HtmlWeb page = new HtmlWeb();
HtmlDocument document = page.Load(url);
page.Get(url, "/");
document.Save("out.html");

Html Agility Pack：从Internet资源获取HTML文档并将其保存到指定文件

2 个答案: