使用C#将带有Microsoft Translator API(SOAP)的网页翻译。我想翻译我的网站,但使用翻译小工具对我不好,因为我需要谷歌抓取我的翻译页面。所以我需要在将其发送到浏览器之前进行翻译。
到目前为止还没有API(我试过找到它,我不能,如果你碰巧知道一个请提及),你可以传递一个网址,它会发送给你翻译的响应:{{3 }}
这些是我迄今为止所做的尝试: 1.从Url下载字符串,传递给Client.Translate(..)。
格式化程序在尝试反序列化时抛出异常 消息:反序列化操作的请求消息正文时出错 '翻译'。已经有最大字符串内容长度配额(30720) 读取XML数据时超出了这个配额可能会增加 更改了MaxStringContentLength属性 创建XML阅读器时使用的XmlDictionaryReaderQuotas对象。 第516行,第48位。
2
private static void processDocument(HtmlAgilityPack.HtmlDocument html, LanguageServiceClient Client)
{
HtmlNodeCollection coll = html.DocumentNode.SelectNodes("//text()[normalize-space(.) != '']");
foreach (HtmlNode node in coll)
{
if (node.InnerText == node.InnerHtml)
{
//node.InnerHtml = translateText(node.InnerText);
node.InnerHtml = Client.Translate("", node.InnerText, "en", "fr", "text/html", "general");
}
}
}
这一次占用了太多时间。最后我得到了一个Bad request(400)异常。
解决这个问题的最佳方法是什么?我还计划保存文档,以便我不必每次都进行翻译。
答案 0 :(得分:1)
此C#示例从本地文件转换HTML:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using HtmlAgilityPack;
namespace TranslationAssistant.Business
{
class HTMLTranslationManager
{
public static int DoTranslation(string htmlfilename, string fromlanguage, string tolanguage)
{
string htmldocument = File.ReadAllText(htmlfilename);
string htmlout = string.Empty;
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(htmldocument);
htmlDoc.DocumentNode.SetAttributeValue("lang", TranslationServices.Core.TranslationServiceFacade.LanguageNameToLanguageCode(tolanguage));
var title = htmlDoc.DocumentNode.SelectSingleNode("//head//title");
if (title != null) title.InnerHtml = TranslationServices.Core.TranslationServiceFacade.TranslateString(title.InnerHtml, fromlanguage, tolanguage, "text/html");
var body = htmlDoc.DocumentNode.SelectSingleNode("//body");
if (body != null)
{
if (body.InnerHtml.Length < 10000)
{
body.InnerHtml = TranslationServices.Core.TranslationServiceFacade.TranslateString(body.InnerHtml, fromlanguage, tolanguage, "text/html");
}
else
{
List<HtmlNode> nodes = new List<HtmlNode>();
AddNodes(body.FirstChild, ref nodes);
Parallel.ForEach(nodes, (node) =>
{
if (node.InnerHtml.Length > 10000)
{
throw new Exception("Child node with a length of more than 10000 characters encountered.");
}
node.InnerHtml = TranslationServices.Core.TranslationServiceFacade.TranslateString(node.InnerHtml, fromlanguage, tolanguage, "text/html");
});
}
}
htmlDoc.Save(htmlfilename, Encoding.UTF8);
return 1;
}
/// <summary>
/// Add nodes of size smaller than 10000 characters to the list, and recurse into the bigger ones.
/// </summary>
/// <param name="rootnode">The node to start from</param>
/// <param name="nodes">Reference to the node list</param>
private static void AddNodes(HtmlNode rootnode, ref List<HtmlNode> nodes)
{
string[] DNTList = { "script", "#text", "code", "col", "colgroup", "embed", "em", "#comment", "image", "map", "media", "meta", "source", "xml"}; //DNT - Do Not Translate - these nodes are skipped.
HtmlNode child = rootnode;
while (child != rootnode.LastChild)
{
if (!DNTList.Contains(child.Name.ToLowerInvariant())) {
if (child.InnerHtml.Length > 10000)
{
AddNodes(child.FirstChild, ref nodes);
}
else
{
if (child.InnerHtml.Trim().Length != 0) nodes.Add(child);
}
}
child = child.NextSibling;
}
}
}
}
这是http://github.com/microsofttranslator/documenttranslator中的HTMLTranslationManager.cs,它使用TranslationServiceFacade.cs中的辅助函数TranslateString()。您可以简化并在此处插入翻译服务调用来代替TranslateString()。