我正在使用itextpdf版本5.5.6。我传递包含上标标记的html,即<sup>ABC</sup>
以及其他HTML内容。但 ABC 文字显示为普通文字。看起来上标标记<sup>
已转义, ABC 文字显示为普通文本。下面是使用itextpdf生成PDF的代码。
CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
byte[] byte1=htmlBufferForPDF.toString().getBytes("UTF-8");
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
ByteArrayInputStream stream = new ByteArrayInputStream(byte1);
p.parse(stream, Charset.forName("UTF-8"));
任何解决此问题的建议都会非常有用。
由于
答案 0 :(得分:2)
以下内容适用于iTextSharp / XML Worker 5.5.11 using the overloaded parseXHtml method并明确设置CSS样式。
HTML:
string HTML = @"
<html><head>
<title>Test HTML</title>
</head><body>
<div>The 1<sup>st</sup> day of the month</div>
</body></html>
";
解析代码:
string css = "sup { vertical-align: super; font-size: 0.8em; }";
using (var stream = new MemoryStream())
{
using (var document = new Document())
{
PdfWriter writer = PdfWriter.GetInstance(document, stream);
document.Open();
using (var htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(HTML)))
{
using (var cssStream = new MemoryStream(Encoding.UTF8.GetBytes(css)))
{
XMLWorkerHelper.GetInstance().ParseXHtml(
writer, document, htmlStream, cssStream
);
}
}
}
File.WriteAllBytes(OUTPUT, stream.ToArray());
}
输出:
答案 1 :(得分:0)
这对HTML5和PDF视图都有效。似乎pdf忽略了CSS,但是喜欢标记...
import mechanize
br = mechanize.Browser()
cookiejar = cookielib.LWPCookieJar()
br.set_cookiejar( cookiejar )
# Browser options
br.set_handle_equiv( True )
br.set_handle_gzip( True )
br.set_handle_redirect( True )
br.set_handle_referer( True )
br.set_handle_robots( False )
......
br.set_handle_refresh( mechanize._http.HTTPRefreshProcessor(), max_time = 1 )
br.addheaders = [ ( 'User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1' ) ]
# authenticate
....
url = "https://www.website.com/topics/?p=1"
url = br.open(url)
returnPage = url.read()
print returnPage