当我使用HttpWebRequest下载页面时,我遇到了一个问题:
在创建新的Uri之前,我将转义url,并将其传递给Uri构造函数。但是当我使用HttpWebRequest下载页面时,它会转换引用字符。奇怪。
一部开拓创新:
https://fr.wikipedia.org/wiki/Roi_Julian_!_L'Élu_des_lémurs
Escaped,并传递给Uri构造函数:
https://fr.wikipedia.org/wiki/Roi_Julian_!_L%27%C3%89lu_des_l%C3%A9murs
HttpWebRequest发送到服务器:
https://fr.wikipedia.org/wiki/Roi_Julian_!_L'%C3%89lu_des_l%C3%A9murs
以下是我的测试代码:
private static void Test()
{
var title = "Roi_Julian_!_L'Élu_des_lémurs";
var url = "https://fr.wikipedia.org/wiki/" + Uri.EscapeDataString(title);
var uri = new Uri(url);
HttpWebDownload(uri);
}
private static void HttpWebDownload(Uri uri)
{
WebResponse response = null;
StreamReader reader = null;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.Method = "GET";
request.AllowAutoRedirect = false;
response = request.GetResponse();
reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
string pageResponse = reader.ReadToEnd();
Console.WriteLine(pageResponse);
}
这是system.net的跟踪日志:
System.Net Verbose: 0 : [10640] WebRequest::Create(https://fr.wikipedia.org/wiki/Roi_Julian_!_L'Élu_des_lémurs)
System.Net Verbose: 0 : [10640] HttpWebRequest#33111870::HttpWebRequest(https://fr.wikipedia.org/wiki/Roi_Julian_!_L'Élu_des_lémurs#-554901600)
System.Net Information: 0 : [10640] Current OS installation type is 'Server'.
System.Net Information: 0 : [10640] RAS supported: True
System.Net Verbose: 0 : [10640] Exiting HttpWebRequest#33111870::HttpWebRequest()
System.Net Verbose: 0 : [10640] Exiting WebRequest::Create() -> HttpWebRequest#33111870
System.Net Verbose: 0 : [10640] HttpWebRequest#33111870::GetResponse()
System.Net Error: 0 : [10640] Can't retrieve proxy settings for Uri 'https://fr.wikipedia.org/wiki/Roi_Julian_!_L'Élu_des_lémurs'. Error code: 12180.
System.Net Verbose: 0 : [10640] ServicePoint#66337667::ServicePoint(fr.wikipedia.org:443)
System.Net Information: 0 : [10640] Associating HttpWebRequest#33111870 with ServicePoint#66337667
System.Net Information: 0 : [10640] Associating Connection#35489797 with HttpWebRequest#33111870
System.Net Information: 0 : [10640] Connection#35489797 - Created connection from 10.168.184.78:55975 to 198.35.26.96:443.
System.Net Information: 0 : [10640] TlsStream#45795543::.ctor(host=fr.wikipedia.org, #certs=0)
System.Net Information: 0 : [10640] Associating HttpWebRequest#33111870 with ConnectStream#65677972
System.Net Information: 0 : [10640] HttpWebRequest#33111870 - Request: GET /wiki/Roi_Julian_!_L'%C3%89lu_des_l%C3%A9murs HTTP/1.1
System.Net Information: 0 : [10640] ConnectStream#65677972 - Sending headers
{
Host: fr.wikipedia.org
Connection: Keep-Alive
}.
我认为这是由dontEscape参数引起的,因此,我添加了一个新函数来修复它,但是,我失败了。
private const ulong UserEscape = 0x00080000;
public static void EnableUserEscape(Uri uri)
{
FieldInfo fieldInfo = uri.GetType().GetField("m_Flags", BindingFlags.Instance | BindingFlags.NonPublic);
if (fieldInfo == null)
{
throw new MissingFieldException("'m_Flags' field not found");
}
var uriFlags = (ulong)fieldInfo.GetValue(uri);
uriFlags = uriFlags | UserEscape;
fieldInfo.SetValue(uri, uriFlags);
}
在将Uri传递给HttpWebDownload()之前,我使用此函数启用UserEscape,但最后HttpWebRequest将此类URL(https://fr.wikipedia.org/wiki/Roi_Julian_!_L'?lu_des_l?murs)发送到服务器。
任何人都可以提供解决方案吗?
由于
答案 0 :(得分:0)
在你的测试方法中定义uri如下。它可以帮到你吗
带有@" somestring + SpecialCharacter"的字符串被称为逐字字符串。它基本上意味着,不要对字符串中的特殊字符应用任何解释,直到达到下一个引号字符"
var url = @"https://fr.wikipedia.org/wiki/Roi_Julian_!_L'Élu_des_lémurs";