如何消除左前箭头和左前箭头之间的空白区域尾随文字?

时间:2013-12-19 19:38:59

标签: c# regex replace

我有一个字符串,我需要解析为XElement以进行进一步处理(我无法控制输入字符串,这是实际XML的简化版本,但充分展示了问题):

string inputXML = @"
    <
    blahblahblahblahblah>";

我正在尝试删除回车&amp;打开左箭头后立即出现空白区域(XElement在左箭头打开后不会用前导空格解析它)。这是我尝试过的:

//tried making new strings instead of reusing the existing one,
//didn't make any difference
string test = inputXML.Replace("\r\n",""); 
string test2 = test.Replace(@"^<\s+", "<");
Console.WriteLine(test2);

这会产生一个如下所示的字符串:

<        blahblahblahblahblah>

而不是:

<blahblahblahblahblah>

除了上述内容外,我还尝试过:

inputXML.Replace(@"<[ ]+", "<");  //doesn't work
inputXML.Replace(@"< +", "<");  //doesn't work
inputXML.Replace(@"<\040+", "<");  //doesn't work
inputXML.Replace(@"<        ", "<"); //works!, but not very useful and I don't
//understand why I need twice as many spaces as the actual number?  Since I don't
//control the input, this isn't a solution, it only happens to work for this one.

我很确定我错过了一些愚蠢的东西。所有这些正则表达式都在 www.rubular.com 中工作,我意识到它适用于Ruby,但它对测试很方便。

我也没有和正则表达式结婚,所以如果你有另一个建议,我会全力以赴。

我不认为这是密切相关的,但我正在LINQPad中对此进行测试。

2 个答案:

答案 0 :(得分:2)

你有两个问题:

  1. string.Replace不适用于正则表达式。请改用Regex.Replace
  2. 字符串中的^锚意味着<必须出现在字符串的开头。如果您只想在第一个<之后删除空格,请删除锚点。
  3. 试试这个:

    string test = inputXML.Replace("\r\n",""); 
    string test2 = Regex.Replace(test, @"<\s*", "<");
    Console.WriteLine(test2); // "    <blahblahblahblahblah>"
    

    或者,如果您还希望删除<之前的任何空格,请使用:

    string test = inputXML.Replace("\r\n",""); 
    string test2 = Regex.Replace(test, @"\s*<\s*", "<");
    Console.WriteLine(test2); // "<blahblahblahblahblah>"
    

答案 1 :(得分:2)

鉴于XML的任意内容,删除标记内的空格可能是安全的。所以:

string inputXML = @"
<
blahblahblahblahblah>";
string pattern = @"(?<=\<)\s+"; //match one or more whitespace following a <
var cleaned = Regex.Replace(inputXML,
                            pattern, 
                            string.Empty,
                            RegexOptions.Multiline)