我有一个字符串,我需要解析为XElement以进行进一步处理(我无法控制输入字符串,这是实际XML的简化版本,但充分展示了问题):
string inputXML = @"
<
blahblahblahblahblah>";
我正在尝试删除回车&amp;打开左箭头后立即出现空白区域(XElement在左箭头打开后不会用前导空格解析它)。这是我尝试过的:
//tried making new strings instead of reusing the existing one,
//didn't make any difference
string test = inputXML.Replace("\r\n","");
string test2 = test.Replace(@"^<\s+", "<");
Console.WriteLine(test2);
这会产生一个如下所示的字符串:
< blahblahblahblahblah>
而不是:
<blahblahblahblahblah>
除了上述内容外,我还尝试过:
inputXML.Replace(@"<[ ]+", "<"); //doesn't work
inputXML.Replace(@"< +", "<"); //doesn't work
inputXML.Replace(@"<\040+", "<"); //doesn't work
inputXML.Replace(@"< ", "<"); //works!, but not very useful and I don't
//understand why I need twice as many spaces as the actual number? Since I don't
//control the input, this isn't a solution, it only happens to work for this one.
我很确定我错过了一些愚蠢的东西。所有这些正则表达式都在 www.rubular.com 中工作,我意识到它适用于Ruby,但它对测试很方便。
我也没有和正则表达式结婚,所以如果你有另一个建议,我会全力以赴。
我不认为这是密切相关的,但我正在LINQPad中对此进行测试。
答案 0 :(得分:2)
你有两个问题:
string.Replace
不适用于正则表达式。请改用Regex.Replace
。^
锚意味着<
必须出现在字符串的开头。如果您只想在第一个<
之后删除空格,请删除锚点。试试这个:
string test = inputXML.Replace("\r\n","");
string test2 = Regex.Replace(test, @"<\s*", "<");
Console.WriteLine(test2); // " <blahblahblahblahblah>"
或者,如果您还希望删除<
之前的任何空格,请使用:
string test = inputXML.Replace("\r\n","");
string test2 = Regex.Replace(test, @"\s*<\s*", "<");
Console.WriteLine(test2); // "<blahblahblahblahblah>"
答案 1 :(得分:2)
鉴于XML的任意内容,删除标记内的空格可能是安全的。所以:
string inputXML = @"
<
blahblahblahblahblah>";
string pattern = @"(?<=\<)\s+"; //match one or more whitespace following a <
var cleaned = Regex.Replace(inputXML,
pattern,
string.Empty,
RegexOptions.Multiline)