我写了一个下载网页的程序。它适用于大多数网页,但我找到了一些不起作用的页面。
这些页面包含0x00个字符。
我能够阅读此内容之前的页面内容,而不是之后的内容。
我使用这段代码来阅读回复:
IAsyncResult ar = null;
HttpWebResponse resp = null;
Stream responseStream = null;
String content = null;
...
resp = (HttpWebResponse)req.EndGetResponse(ar);
responseStream = resp.GetResponseStream();
StreamReader sr = new StreamReader(responseStream, Encoding.UTF8);
content = sr.ReadToEnd();
在这个例子中,我使用异步请求,但我尝试使用同步问题,我有同样的问题。
我也尝试使用相同的结果:
HttpWebResponse resp = null;
Stream responseStream = null;
String content = new String();
...
responseStream = resp.GetResponseStream();
byte[] buffer = new byte[4096];
int bytesRead = 1;
while (bytesRead > 0)
{
bytesRead = responseStream.Read(buffer, 0, 4096);
content += Encoding.UTF8.GetString(buffer, 0, bytesRead);
}
例如,此网址http://www.daz3d.com/i/search/searchsub?sstring=ps_tx1662b&_m=dps_tx1662b
出现问题感谢您的回复
Euyeusu
答案 0 :(得分:1)
您的问题是将收到的内容转换为字符串,您需要删除那些0x00
个字节:
AutoResetEvent sync = new AutoResetEvent(false);
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://...");
request.Proxy.Credentials = CredentialCache.DefaultCredentials;
request.BeginGetResponse((result) =>
{
StringBuilder content = new StringBuilder();
using (HttpWebResponse response =
request.EndGetResponse(result) as HttpWebResponse)
using (Stream stream = response.GetResponseStream())
{
int read = 1;
byte[] buffer = new byte[0x1000];
while (read > 0)
{
read = stream.Read(buffer, 0, buffer.Length);
content.Append(Encoding.UTF8.GetString(buffer
.TakeWhile((b, index) => index <= read)
.Where(b => b != 0x00).ToArray()));
}
Console.WriteLine(content);
sync.Set();
}
}, null);
sync.WaitOne();
答案 1 :(得分:0)
实际上是失败的编码。要解决它,你必须过滤掉0x00字节。这样的事情可以解决问题:
using System.Net;
using System.IO;
using System.Text;
WebRequest request = WebRequest.Create("url here");
WebResponse response = request.GetResponse();
string html;
using (Stream stream = response.GetResponseStream())
{
int index = -1, currentByte = 0;
byte[] buffer = new byte[response.ContentLength];
while ((currentByte = stream.ReadByte()) > -1)
{
if(currentByte > 0) buffer[++index] = (byte)currentByte;
}
html = Encoding.ASCII.GetString(buffer, 0, index + 1);
}