如何刮取axd资源的内容?

时间:2009-10-16 22:24:41

标签: c# asp.net screen-scraping

基本上我有一个img标记,其src属性为/ChartImg.axd?i=chart_0_0.png&g=06469eea67ea452b977f8e73cad70691。我是否需要创建另一个WebRequest来获取此资源的内容或者是否有更简单的方法?

我正在抓取当前请求的输出。以下是我到目前为止所得到的......

基本上,我的additionaAssets在某些情况下将包含.axd资源的相对Uri。我想将这些内容包含在我正在构建的档案中。

    private void ProcessPrintRequest()
    {
        this.Response.Clear();
        this.Response.ContentType = "application/zip";
        this.Response.AddHeader("Content-Disposition", "attachment;filename=archive.zip");

        using (var stream = new ZipOutputStream(new ZeroByteStreamWrapper(this.Response.OutputStream)))
        {
            stream.SetLevel(9);

            var additionalAssets = new PathNormailzationDictionary();

            this.ExportDocument(stream, additionalAssets);
            this.ExportAdditionalAssets(stream, additionalAssets);
        }

        this.Response.End();
    }

    private void ExportAdditionalAssets(ZipOutputStream stream, PathNormailzationDictionary additionalAssets)
    {
        var buffer = new byte[32 * 1024];
        int read;

        // TODO: Request content of .axd resources
        foreach (var item in additionalAssets.Where(item => File.Exists(Server.MapPath(item.Key))))
        {
            var entry = new ZipEntry(item.Value);

            stream.PutNextEntry(entry);

            using (var fileStream = File.OpenRead(Server.MapPath(item.Key))) 
            {
                while ((read = fileStream.Read(buffer, 0, buffer.Length)) > 0)
                {
                    stream.Write(buffer, 0, read);
                }
            }
        }
    }

    private void ExportDocument(ZipOutputStream stream, PathNormailzationDictionary additionalAssets)
    {
        var entry = new ZipEntry("index.html");

        stream.PutNextEntry(entry);

        var document = this.GetNormalizedDocument(additionalAssets);

        var writer = new StreamWriter(stream);
        writer.Write(document);
        writer.Flush();
    }

    private string GetNormalizedDocument(PathNormailzationDictionary additionalAssets);

1 个答案:

答案 0 :(得分:2)

是的,您必须创建另一个webrequest。任何给定的HTML页面都包含多个http请求;一个用于html页面,另一个用于每个外部SRC。没有远离它。

-Oisin