由于我无法对此问题有任何其他选择,因为我无法更改程序,因此我需要以编程方式删除文本行中存在的百分号垃圾格式:
查询将返回如下字符串:
'%3CSPAN style='FONT-SIZE: 12pt; FONT-FAMILY: %22Times New Roman%22,%22serif%22; mso-fareast-font-family: %22Times New Roman%22; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA'%3E%3CFONT color=#000000%3E3/20/18: Mrs. McDoogal completed a medical assessment with Dr. John Zoidberg, MD, at Futurama on 4/6/15 and he completed a new substance assessment on 4/14/18.%3C/FONT%3E%3CSPAN style=%22mso-spacerun: yes%22%3E%3CFONT color=#000000%3E %3C/FONT%3E%3C/SPAN%3E%3CFONT color=#000000%3EMrs. McDoogal is diagnosed with Foobar I diagnosis of Groovy Mind, Foo; Cartoon Dependence; and Fiddling Disorder. %3C/FONT%3E%3CSPAN style=%22mso-spacerun: yes%22%3E%3CFONT color=#000000%3E %3C/FONT%3E%3C/SPAN%3E%3CFONT color=#000000%3EMr. McDoogal is prescribed DDT 30 mg. and LSD 150 mg ABC.%3C/FONT%3E%3CSPAN style=%22mso-spacerun: yes%22%3E%3CFONT color=#000000%3E %3C/FONT%3E%3C/SPAN%3E%3CFONT color=#000000%3EMr. McDoogal will be enrolled in the day treatment program at Futurama.%3C/FONT%3E%3CSPAN style=%22mso-spacerun: yes%22%3E%3CFONT color=#000000%3E %3C/FONT%3E%3C/SPAN%3E%3C/SPAN%3E'
我想删除这样的内容:
.%3C/FONT%3E%3CSPAN style=%22mso-spacerun: yes%22%3E%3CFONT color=#000000%3E %3C/FONT%3E%3C/SPAN%3E%3C/SPAN%3E
我要删除的东西的名称是什么?
答案 0 :(得分:0)
如果您对样本数据进行手动搜索和替换,则使用以下值最终得到HTML片段。
值 字符
%3C <
%3E >
22%
“
进行这些替换会产生以下代码,格式化为可读性,但如果原始代码不包含行终止字符,则可能只有一行。
<SPAN style='FONT-SIZE: 12pt; FONT-FAMILY: "Times New Roman","serif"; mso-fareast-font-family: "Times New Roman"; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA'>
<FONT color=#000000>3/20/18: Mrs. McDoogal completed a medical assessment with Dr. John Zoidberg, MD, at Futurama on 4/6/15 and he completed a new substance assessment on 4/14/18.</FONT>
<SPAN style="mso-spacerun: yes">
<FONT color=#000000> </FONT>
</SPAN>
<FONT color=#000000>Mrs. McDoogal is diagnosed with Foobar I diagnosis of Groovy Mind, Foo; Cartoon Dependence; and Fiddling Disorder. </FONT>
<SPAN style="mso-spacerun: yes">
<FONT color=#000000> </FONT>
</SPAN>
<FONT color=#000000>Mr. McDoogal is prescribed DDT 30 mg. and LSD 150 mg ABC.</FONT>
<SPAN style="mso-spacerun: yes">
<FONT color=#000000> </FONT>
</SPAN>
<FONT color=#000000>Mr. McDoogal will be enrolled in the day treatment program at Futurama.</FONT>
<SPAN style="mso-spacerun: yes">
<FONT color=#000000> </FONT>
</SPAN>
</SPAN>
您可以使用String.Replace
方法在c#中执行此操作:
public static ReplaceGarbage(String garbageString)
{
return garbageString.Replace(@"%3C", @"<")
.Replace(@"3E", @">")
.Replace(@"%22", @"""");
}
然后删除标签(如果这是你需要的)只留下正文文本应该是一个相对容易的工作。
public static string StripTagsRegex(string source)
{
return Regex.Replace(source, "<.*?>", string.Empty);
}