如何删除嵌入的格式

时间:2016-01-12 21:30:07

标签: c#

由于我无法对此问题有任何其他选择,因为我无法更改程序,因此我需要以编程方式删除文本行中存在的百分号垃圾格式:

查询将返回如下字符串:

'%3CSPAN style='FONT-SIZE: 12pt; FONT-FAMILY: %22Times New Roman%22,%22serif%22; mso-fareast-font-family: %22Times New Roman%22; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA'%3E%3CFONT color=#000000%3E3/20/18: Mrs. McDoogal completed a medical assessment with Dr. John Zoidberg, MD, at Futurama on 4/6/15 and he completed a new substance assessment on 4/14/18.%3C/FONT%3E%3CSPAN style=%22mso-spacerun: yes%22%3E%3CFONT color=#000000%3E  %3C/FONT%3E%3C/SPAN%3E%3CFONT color=#000000%3EMrs. McDoogal is diagnosed with Foobar I diagnosis of Groovy Mind, Foo; Cartoon Dependence; and Fiddling Disorder. %3C/FONT%3E%3CSPAN style=%22mso-spacerun: yes%22%3E%3CFONT color=#000000%3E %3C/FONT%3E%3C/SPAN%3E%3CFONT color=#000000%3EMr. McDoogal is prescribed DDT 30 mg. and LSD 150 mg ABC.%3C/FONT%3E%3CSPAN style=%22mso-spacerun: yes%22%3E%3CFONT color=#000000%3E  %3C/FONT%3E%3C/SPAN%3E%3CFONT color=#000000%3EMr. McDoogal will be enrolled in the day treatment program at Futurama.%3C/FONT%3E%3CSPAN style=%22mso-spacerun: yes%22%3E%3CFONT color=#000000%3E  %3C/FONT%3E%3C/SPAN%3E%3C/SPAN%3E'

我想删除这样的内容:

.%3C/FONT%3E%3CSPAN style=%22mso-spacerun: yes%22%3E%3CFONT color=#000000%3E  %3C/FONT%3E%3C/SPAN%3E%3C/SPAN%3E

我要删除的东西的名称是什么?

1 个答案:

答案 0 :(得分:0)

如果您对样本数据进行手动搜索和替换,则使用以下值最终得到HTML片段。

字符
%3C <
%3E >
22%

进行这些替换会产生以下代码,格式化为可读性,但如果原始代码不包含行终止字符,则可能只有一行。

<SPAN style='FONT-SIZE: 12pt; FONT-FAMILY: "Times New Roman","serif"; mso-fareast-font-family: "Times New Roman"; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA'>
    <FONT color=#000000>3/20/18: Mrs. McDoogal completed a medical assessment with Dr. John Zoidberg, MD, at Futurama on 4/6/15 and he completed a new substance assessment on 4/14/18.</FONT>
    <SPAN style="mso-spacerun: yes">
        <FONT color=#000000>  </FONT>
    </SPAN>
    <FONT color=#000000>Mrs. McDoogal is diagnosed with Foobar I diagnosis of Groovy Mind, Foo; Cartoon Dependence; and Fiddling Disorder. </FONT>
    <SPAN style="mso-spacerun: yes">
        <FONT color=#000000> </FONT>
    </SPAN>
    <FONT color=#000000>Mr. McDoogal is prescribed DDT 30 mg. and LSD 150 mg ABC.</FONT>
    <SPAN style="mso-spacerun: yes">
        <FONT color=#000000>  </FONT>
    </SPAN>
    <FONT color=#000000>Mr. McDoogal will be enrolled in the day treatment program at Futurama.</FONT>
    <SPAN style="mso-spacerun: yes">
        <FONT color=#000000>  </FONT>
    </SPAN>
</SPAN>

您可以使用String.Replace方法在c#中执行此操作:

public static ReplaceGarbage(String garbageString)
{
    return garbageString.Replace(@"%3C", @"<")
                        .Replace(@"3E", @">")
                        .Replace(@"%22", @"""");
}

然后删除标签(如果这是你需要的)只留下正文文本应该是一个相对容易的工作。

public static string StripTagsRegex(string source)
{
    return Regex.Replace(source, "<.*?>", string.Empty);
}