使用代理抓取网页数据

时间:2016-04-13 19:20:11

标签: c# web screen-scraping

以下代码搜索输入网站的来源,我想做同样的事情 - 但是用户输入的代理。

Console.WriteLine("Enter path");
string fileName = Console.ReadLine();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

if (response.StatusCode == HttpStatusCode.OK)
{
    Console.WriteLine("Page OK");
    Stream receiveStream = response.GetResponseStream();
    StreamReader readStream = null;

    if (response.CharacterSet == null)
    {
        readStream = new StreamReader(receiveStream);
    }
    else
    {
        readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
    }

    string data = readStream.ReadToEnd();


    response.Close();
    readStream.Close();
    Console.WriteLine(data);

    System.IO.File.WriteAllText(@fileName, data);

我尝试了以下代码 - 但是我收到了错误:System.UriFormatException

Console.WriteLine("proxy ip:");
string proxyip = Console.ReadLine();
Console.WriteLine("port");
string proxyport = Console.ReadLine();
string proxyaddress = (proxyip + ":" + proxyport);
HttpWebRequest requestproxy = (HttpWebRequest)WebRequest.Create("url");
WebProxy myproxy = new WebProxy(proxyaddress, false);
requestproxy.Proxy = myproxy;
HttpWebResponse responseproxy = (HttpWebResponse)requestproxy.GetResponse();
Console.WriteLine("file path:");
string fileName = Console.ReadLine();

if (responseproxy.StatusCode == HttpStatusCode.OK)
{
    Console.WriteLine("Page OK");
    Stream receiveStream = responseproxy.GetResponseStream();
    StreamReader readStream = null;

    if (responseproxy.CharacterSet == null)
    {
        readStream = new StreamReader(receiveStream);
    }
    else
    {
        readStream = new StreamReader(receiveStream, Encoding.GetEncoding(responseproxy.CharacterSet));
    }

    string data = readStream.ReadToEnd();

    responseproxy.Close();
    readStream.Close();
    Console.WriteLine(data);
    System.IO.File.WriteAllText(@fileName, data);   

上述代码有什么问题?

2 个答案:

答案 0 :(得分:0)

适用的WebProxy构造函数正在查找第一个参数中的字符串(URL)或URI。

来源:https://msdn.microsoft.com/en-us/library/system.net.webproxy.webproxy(v=vs.110).aspx

主机名+“:”+端口号不符合字符串中URL的条件。您需要“http://xxxxxx”或“https://xxxxx

答案 1 :(得分:0)

在第一个示例中,您附加一个字符串:

HttpWebRequest request =(HttpWebRequest)WebRequest.Create(urlAddress);

在第二个示例中,您忘记将“url”更改为urlAddress字符串。

HttpWebRequest requestproxy =(HttpWebRequest)WebRequest.Create(“url”);

这会导致System.UriFormatException错误。