使用HtmlAgilityPack如何加载编码的html字符串

时间:2016-04-19 21:54:44

标签: html html-agility-pack

我正在努力将html字符串转换为pdf。我目前正在使用ExpertPDF来执行此过程。

但是在进程中我必须清理Html字符串,然后将清理后的html字符串传递给将创建pdf的库。

在清理过程中,第一步是删除不需要的html标签(有些事情就像删除黑名单中的标签一样。列表将由我创建。)。删除不需要的标签我正在使用HtmlAgilityPack来解析html字符串并删除节点

第二步是自动关闭已知标签,如果它们被打开。当我使用HtmlAgilityPack加载html字符串时,它会自动关闭好的标签。但我在这里遇到了一个问题。问题是当有一些文字像"阅读< 5然后去拜访医生"。小于符号将其视为标记的开头,然后将其关闭到某个位置。

例如:以下是html

> <p style="margin-top: 5px; margin-bottom: 5px; line-height: 16.9px;
> background-color: #ffffff;">1. The full evaluation and management of
> acute coronary syndrome, ST-elevation and non-ST-elevation myocardial
> infarction is beyond the scope of SOC tele-intensivist coverage. 2.
> Early cardiology consultation should be recommended in all cases of
> suspected acute coronary syndrome or myocardial infarction 3. However,
> one of these syndromes is suspected, it is reasonable to recommend
> starting therapy with ASA, beta-blockers, nitrates, and morphine
> sulfate. 4. In addition, full dose anticoagulation with
> anticoagulation (e.g. unfractionated heparin, enoxaparin, or
> fondaparinux), second antiplatelet agent (e.g. clopidegrol) and
> initiation of statin therapy (atorvastatin 80 mg orally daily) should
> be considered 5. For patients with heart failure or ejection fraction
> < 40%, ACE-inhibitors (or ARB’s if history of ACE-inhibitors
> intolerance) should be considered. 6. Other more routine options are
> listed below.</p>

使用HtmlAgilityPack并加载上面的html和htmldocument节点将有一个outterxml如下所示

> <p style="margin-top: 5px; margin-bottom: 5px; line-height: 16.9px;
> background-color: #ffffff;">1. The full evaluation and management of
> acute coronary syndrome, ST-elevation and non-ST-elevation myocardial
> infarction is beyond the scope of SOC tele-intensivist coverage. 2.
> Early cardiology consultation should be recommended in all cases of
> suspected acute coronary syndrome or myocardial infarction 3. However,
> one of these syndromes is suspected, it is reasonable to recommend
> starting therapy with ASA, beta-blockers, nitrates, and morphine
> sulfate. 4. In addition, full dose anticoagulation with
> anticoagulation (e.g. unfractionated heparin, enoxaparin, or
> fondaparinux), second antiplatelet agent (e.g. clopidegrol) and
> initiation of statin therapy (atorvastatin 80 mg orally daily) should
> be considered 5. For patients with heart failure or ejection fraction
> < 40%,="" ace-inhibitors="" (or="" arb’s="" if="" history="" of=""
> ace-inhibitors="" intolerance)="" should="" be="" considered.="" 6.=""
> other="" more="" routine="" options="" are="" listed=""></p>

治疗&lt; 40%作为标记,最后关闭,并在文本之间标记为属性。

你可以告诉我,为了将字符串加载到html文档中我不得不做什么,不要对待&lt;作为起始标签。有没有办法让我把文档加载为编码文本而不是普通的字符串文本?

谢谢, 拉吉。

0 个答案:

没有答案