C# - Regex.Replace - Can't figure out why newline is not being replaced

时间:2016-04-04 18:59:28

标签: c# regex

I have not determined why trying to use a negated character class with Regex.Replace is not replacing newlines with a space.

Here's some sample code:

namespace ConsoleApplication1
{
    class Program
       {
        static void Main(string[] args)
          {

            string testInput = "This is a test. \n This is a newline. \n this is another newline. This is a, comma";



            Console.WriteLine(testInput);


            //get rid of line breaks and other letters not allowed
            string commentFix = Regex.Replace(testInput, @"[^A-Z\sa-z\.0-9\-\:\;\$]", " ");
            commentFix = "\"" + commentFix + "\"";


            Console.WriteLine("\n");

            Console.WriteLine(commentFix);
            Console.ReadLine();


          }
      }
}

The output of this is:

This is a test.
 This is a newline.
 this is another newline. This is a, comma

"This is a test.
 This is a newline.
 this is another newline. This is a  comma"

Any ideas? (thanks, this is my first question!)

1 个答案:

答案 0 :(得分:3)

The \s matches a newline, and since it is inside a negated character class, line breaks are not removed.

See more details on what \s matched at MSDN:

\f - The form feed character, \u000C.
\n - The newline character, \u000A.
\r - The carriage return character, \u000D.
\t - The tab character, \u0009.
\v - The vertical tab character, \u000B.
\x85 - The ellipsis or NEXT LINE (NEL) character (…), \u0085.
\p{Z} - Matches any separator character.

So, if you want to remove whitespace, just take out \s (and I guess you need to replace multiple characters matched with one space, add + that will match one or more occurrences of the pattern it quantifies):

[^A-Za-z.0-9:;$-]+

See the regex demo

Also note that you do not have to escape ., :, ; and $ inside a character class, and you do not have to escape - if it is at the beginning/end of the character class.

If you plan to match whitespace with the exception of CR and LF, use [^\S\r\n]: [^A-Z\S\r\na-z.0-9:;$-]+. Here, [^\S] matches a whitespace, but \r\n are inside the negated character class, so they are not matched.