我从服务器收到一些XML,有时候我会在反序列化之前删除一些无效字符。我无法控制我收到的XML文件,因此我需要自己检查无效字符。
示例XML .....
<PrintStatus>N</PrintStatus>
<CustomerPO> >>>> pearl <<<<< </CustomerPO>
<Description>PO# pearl</Description>
<BranchID>4</BranchID>
<PostDate>
<Date>01/13/2015</Date>
</PostDate>
<ShipDate>
<Date>01/13/2015</Date>
</ShipDate>
如您所见,客户部分包含我需要删除的无效字符。这有时仅出现在包含用户类型数据的某些元素中。
这是我的回复代码.....
//configure http request
HttpWebRequest httpRequest = WebRequest.Create(url) as HttpWebRequest;
httpRequest.Method = "POST";
//prepare correct encoding for XML serialization
UTF8Encoding encoding = new UTF8Encoding();
//use Xml property to obtain serialized XML data
//convert into bytes using encoding specified above and get length
byte[] bodyBytes = encoding.GetBytes(Xml);
httpRequest.ContentLength = bodyBytes.Length;
//get http request stream for putting XML data into
Stream httpRequestBodyStream = httpRequest.GetRequestStream();
//fill stream with serialized XML data
httpRequestBodyStream.Write(bodyBytes, 0, bodyBytes.Length);
httpRequestBodyStream.Close();
//get http response
HttpWebResponse httpResponse = httpRequest.GetResponse() as HttpWebResponse;
StreamReader httpResponseStream = new StreamReader(httpResponse.GetResponseStream(), System.Text.Encoding.ASCII);
//extract XML from response
string httpResponseBody = httpResponseStream.ReadToEnd();
httpResponseStream.Close();
//ignore everything that isn't XML by removing headers
httpResponseBody = httpResponseBody.Substring(httpResponseBody.IndexOf("<?xml"));
//deserialize XML into ProductInquiryResponse
XmlSerializer serializer = new XmlSerializer(typeof(MyResponseClass));
StringReader responseReader = new StringReader(httpResponseBody);
//return MyResponseClass result
return serializer.Deserialize(responseReader) as MyResponseClass;
有没有人碰巧有任何检查XML的建议?我应该在xml字符串反序列化之前检查我关注的元素吗?或者,还有更好的方法?
答案 0 :(得分:0)
您的问题的一般修复方法是以递归方式下降XML,随时解析并与该节点的架构进行比较。在任何时候,如果输入与模式预期的输入不同,或者以某种方式出错,请允许错误处理程序运行以修复输入流,回滚到最近的良好状态并继续前进固定输入。
.Net XmlTextReader
类不够灵活,无法做到这一点。但是,如果您事先知道某些XML元素不能拥有子级的架构,则以下内容将读取XML输入流,并在遇到其完全限定名称与已知名称匹配的元素时叶子节点,&#34;逃避&#34;所有这些节点的文本:
public enum XmlDoctorStatus
{
NoFixNeeded,
FixMade,
FixFailed
}
public class XmlDoctor
{
internal class XmlFixData
{
public string InitialXml { get; private set; }
public string FixedXml { get; private set; }
public int LineNumber { get; private set; }
public int LinePosition { get; private set; }
public XmlFixData(string initialXml, string fixedXml, int lineNumber, int linePosition)
{
this.InitialXml = initialXml;
this.FixedXml = fixedXml;
this.LineNumber = lineNumber;
this.LinePosition = linePosition;
}
public bool ComesAfter(XmlFixData other)
{
if (LineNumber > other.LineNumber)
return true;
if (LineNumber == other.LineNumber && LinePosition > other.LinePosition)
return true;
return false;
}
}
internal class XmlFixedException : Exception
{
public XmlFixData XmlFixData { get; private set; }
public XmlFixedException(XmlFixData data)
{
this.XmlFixData = data;
}
}
readonly HashSet<XName> childlessNodes;
public string OriginalXml { get; private set; }
public XmlDoctor(string xml, IEnumerable<XName> childlessNodes)
{
if (xml == null)
throw new ArgumentNullException();
this.OriginalXml = xml;
this.childlessNodes = new HashSet<XName>(childlessNodes);
}
List<int> indices = null;
string passXml = string.Empty;
bool inPass = false;
void InitializePass(string xml)
{
if (inPass)
throw new Exception("nested pass");
ClearElementData();
TextHelper.NormalizeLines(xml, out passXml, out indices);
inPass = true;
}
void EndPass()
{
inPass = false;
indices = null;
passXml = string.Empty;
ClearElementData();
}
static int LineNumber(XmlReader reader)
{
return ((IXmlLineInfo)reader).LineNumber;
}
static int LinePosition(XmlReader reader)
{
return ((IXmlLineInfo)reader).LinePosition;
}
// Taken from https://stackoverflow.com/questions/1132494/string-escape-into-xml
public static string XmlEscape(string escaped)
{
var replacements = new KeyValuePair<string, string>[]
{
new KeyValuePair<string,string>("&", "&"),
new KeyValuePair<string,string>("\"", """),
new KeyValuePair<string,string>("'", "'"),
new KeyValuePair<string,string>("<", "<"),
new KeyValuePair<string,string>(">", ">"),
};
foreach (var pair in replacements)
foreach (var index in escaped.IndexesOf(pair.Key, 0).Reverse())
if (!replacements.Any(other => string.Compare(other.Value, 0, escaped, index, other.Value.Length, StringComparison.Ordinal) == 0))
{
escaped = escaped.Substring(0, index) + pair.Value + escaped.Substring(index + 1, escaped.Length - index - 1);
}
return escaped;
}
void HandleNode(XmlReader reader)
{
// Adapted from http://blogs.msdn.com/b/mfussell/archive/2005/02/12/371546.aspx
if (reader == null)
{
throw new ArgumentNullException("reader");
}
switch (reader.NodeType)
{
case XmlNodeType.Element:
HandleStartElement(reader);
if (reader.IsEmptyElement)
{
HandleEndElement(reader);
}
break;
case XmlNodeType.Text:
HandleText(reader);
break;
case XmlNodeType.Whitespace:
case XmlNodeType.SignificantWhitespace:
break;
case XmlNodeType.CDATA:
break;
case XmlNodeType.EntityReference:
break;
case XmlNodeType.XmlDeclaration:
case XmlNodeType.ProcessingInstruction:
break;
case XmlNodeType.DocumentType:
break;
case XmlNodeType.Comment:
break;
case XmlNodeType.EndElement:
HandleEndElement(reader);
break;
}
}
private void HandleText(XmlReader reader)
{
if (string.IsNullOrEmpty(currentElementLocalName) || string.IsNullOrEmpty(currentElementName))
return;
var name = XName.Get(currentElementLocalName, currentElementNameSpace);
if (!childlessNodes.Contains(name))
return;
var lineIndex = LineNumber(reader) - 1;
var charIndex = LinePosition(reader) - 1;
if (lineIndex < 0 || charIndex < 0)
return;
int startIndex = indices[lineIndex] + charIndex;
// Scan forward in the input string until we find either the beginning of a CDATA section or the end of this element.
// Patterns to match: </Name
//
string pattern1 = "</" + currentElementName;
var index1 = FindElementEnd(passXml, startIndex, pattern1);
if (index1 < 0)
return; // BAD XML.
string pattern2 = "<![CDATA[";
var index2 = passXml.IndexOf(pattern2, startIndex);
int endIndex = (index2 < 0 ? index1 : Math.Min(index1, index2));
var text = passXml.Substring(startIndex, endIndex - startIndex);
var escapeText = XmlEscape(text);
if (escapeText != text)
{
if (escapeText != XmlEscape(escapeText))
{
Debug.Assert(escapeText == XmlEscape(escapeText));
throw new InvalidOperationException("Escaping error");
}
string fixedXml = passXml.Substring(0, startIndex) + escapeText + passXml.Substring(endIndex, passXml.Length - endIndex);
throw new XmlFixedException(new XmlFixData(passXml, fixedXml, lineIndex + 1, charIndex + 1));
}
}
static bool IsXmlSpace(char ch)
{
// http://www.w3.org/TR/2000/REC-xml-20001006#NT-S
// [3] S ::= (#x20 | #x9 | #xD | #xA)+
return ch == '\u0020' || ch == '\u0009' || ch == '\u000D' || ch == '\u000A';
}
private static int FindElementEnd(string passXml, int charPos, string tagEnd)
{
while (true)
{
var index = passXml.IndexOf(tagEnd, charPos);
if (index < 0)
return index;
int endPos = index + tagEnd.Length;
if (index + tagEnd.Length >= passXml.Length)
return -1; // Bad xml?
// Now we must have zero or more white space characters and a ">"
while (endPos < passXml.Length && IsXmlSpace(passXml[endPos]))
endPos++;
if (endPos >= passXml.Length)
return -1; // BAD XML;
if (passXml[endPos] == '>')
return index;
index = endPos;
// Spurious ending, keep searching.
}
}
string currentElementName = string.Empty;
string currentElementNameSpace = string.Empty;
string currentElementLocalName = string.Empty;
private void HandleStartElement(XmlReader reader)
{
currentElementName = reader.Name;
currentElementLocalName = reader.LocalName;
currentElementNameSpace = reader.NamespaceURI;
}
private void HandleEndElement(XmlReader reader)
{
ClearElementData();
}
private void ClearElementData()
{
currentElementName = string.Empty;
currentElementNameSpace = string.Empty;
currentElementLocalName = string.Empty;
}
public XmlDoctorStatus TryFix(out string newXml)
{
XmlFixData data = null;
while (true)
{
XmlFixData newData;
var status = TryFixOnePass((data == null ? OriginalXml : data.FixedXml), out newData);
switch (status)
{
case XmlDoctorStatus.FixFailed:
Debug.WriteLine("Could not fix XML");
newXml = OriginalXml;
return XmlDoctorStatus.FixFailed;
case XmlDoctorStatus.FixMade:
if (data != null && !newData.ComesAfter(data))
{
Debug.WriteLine("Warning -- possible infinite loop detected, aborting fix");
newXml = OriginalXml;
return XmlDoctorStatus.FixFailed;
}
data = newData;
break; // Try to fix more
case XmlDoctorStatus.NoFixNeeded:
if (data == null)
{
newXml = OriginalXml;
return XmlDoctorStatus.NoFixNeeded;
}
else
{
newXml = data.FixedXml;
return XmlDoctorStatus.FixMade;
}
}
}
}
XmlDoctorStatus TryFixOnePass(string xml, out XmlFixData data)
{
try
{
InitializePass(xml);
using (var textReader = new StringReader(passXml))
using (XmlReader reader = XmlReader.Create(textReader))
{
while (true)
{
bool read = reader.Read();
if (!read)
break;
HandleNode(reader);
}
}
}
catch (XmlFixedException ex)
{
// Success - a fix was made.
data = ex.XmlFixData;
return XmlDoctorStatus.FixMade;
}
catch (Exception ex)
{
// Failure - the file was not fixed and could not be parsed.
Debug.WriteLine("Fix Failed: " + ex.ToString());
data = null;
return XmlDoctorStatus.FixFailed;
}
finally
{
EndPass();
}
// No fix needed.
data = null;
return XmlDoctorStatus.NoFixNeeded;
}
}
public static class TextHelper
{
public static void NormalizeLines(string text, out string newText, out List<int> lineIndices)
{
var sb = new StringBuilder();
var indices = new List<int>();
using (var sr = new StringReader(text))
{
string line;
while ((line = sr.ReadLine()) != null)
{
indices.Add(sb.Length);
sb.AppendLine(line);
}
}
lineIndices = indices;
newText = sb.ToString();
}
public static IEnumerable<int> IndexesOf(this string str, string value, int startAt)
{
if (str == null)
yield break;
for (int index = startAt, valueLength = value.Length; ; index += valueLength)
{
index = str.IndexOf(value, index);
if (index == -1)
break;
yield return index;
}
}
}
然后使用它:
public static class TestXmlDoctor
{
public static void TestFix()
{
string xml1 = @"<?xml version=""1.0"" encoding=""UTF-8""?>
<MainClass>
<PrintStatus>N</PrintStatus>
<CustomerPO> >>>> pearl <<<<< </CustomerPO>
<Description>PO# pearl</Description>
<BranchID>4</BranchID>
<PostDate>
<Date>01/13/2015</Date>
</PostDate>
<ShipDate>
<Date>01/13/2015</Date>
</ShipDate>
</MainClass>
";
XName[] childlessNodes1 = new XName[]
{
XName.Get("CustomerPO", string.Empty),
};
try
{
TestFix(xml1, childlessNodes1);
}
catch (Exception ex)
{
Debug.WriteLine(ex);
}
}
public static string TestFix(string xml, IEnumerable<XName> childlessNodes)
{
string fixedXml;
var status = (new XmlDoctor(xml, childlessNodes).TryFix(out fixedXml));
switch (status)
{
case XmlDoctorStatus.NoFixNeeded:
return xml;
case XmlDoctorStatus.FixFailed:
Debug.WriteLine("Failed to fix xml");
return xml;
case XmlDoctorStatus.FixMade:
Debug.WriteLine("Fixed XML, new XML is as follows:");
Debug.WriteLine(fixedXml);
Debug.WriteLine(string.Empty);
return fixedXml;
default:
Debug.Assert(false, "Unknown fix status " + status.ToString());
return xml;
}
}
}
这样,您的XML片段就可以被解析,并变为:
<?xml version="1.0" encoding="UTF-8"?>
<MainClass>
<PrintStatus>N</PrintStatus>
<CustomerPO> >>>> pearl <<<<< </CustomerPO>
<Description>PO# pearl</Description>
<BranchID>4</BranchID>
<PostDate>
<Date>01/13/2015</Date>
</PostDate>
<ShipDate>
<Date>01/13/2015</Date>
</ShipDate>
</MainClass>