这是我正在尝试的代码,但没有一个不是1255而不是862:
Encoding latinEncoding = Encoding.GetEncoding("Windows-1252");
Encoding hebrewEncoding = Encoding.GetEncoding(862);//"Windows-1255");
string name = anchor.InnerText;
byte[] latinBytes = latinEncoding.GetBytes(name);
string hebrewString = hebrewEncoding.GetString(latinBytes);
也许问题是它在源代码中没有拉丁语,我在变量名称中看到的是:����� �����
而不是希伯来字母。
这是我正在使用的完整方法:
private void parseIds(string html)
{
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(html);
var anchor = htmlDoc.DocumentNode.Descendants("a").FirstOrDefault();
if (anchor != null)
{
Encoding latinEncoding = Encoding.GetEncoding("Windows-1252");
Encoding hebrewEncoding = Encoding.GetEncoding(862);//"Windows-1255");
string name = anchor.InnerText;
byte[] latinBytes = latinEncoding.GetBytes(name);
string hebrewString = hebrewEncoding.GetString(latinBytes);
string href = anchor.Attributes["href"].Value;
Uri uri;
if (Uri.TryCreate(href, UriKind.RelativeOrAbsolute, out uri))
{
if (!uri.IsAbsoluteUri)
uri = new Uri(new Uri("http://www.google.com/"), uri);
var queryKeyValues = System.Web.HttpUtility.ParseQueryString(uri.Query);
string forumId = queryKeyValues["forumId"];
}
}
}
这就是我在构造函数中调用它的方式:
WebClient webclient = new WebClient();
webclient.DownloadFile("http://www.tapuz.co.il/forums/forumslistnew.asp", @"c:\testhtml\mainforums.html");
webclient.Dispose();
string[] lines = File.ReadAllLines(@"c:\testhtml\mainforums.html");
foreach(string line in lines)
{
if (line.Contains("href") && line.Contains("forumId=") && !wholeids.Contains(line))
{
parseIds(line);
}
}
我应该在哪里进行希伯来语的编码?
我试图使用:
webclient.Encoding = System.Text.Encoding.UTF8;
在webclient.DownloadFile
之前和此行之后一次,但它没有改变任何内容。