I have not determined why trying to use a negated character class with Regex.Replace is not replacing newlines with a space.
Here's some sample code:
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string testInput = "This is a test. \n This is a newline. \n this is another newline. This is a, comma";
Console.WriteLine(testInput);
//get rid of line breaks and other letters not allowed
string commentFix = Regex.Replace(testInput, @"[^A-Z\sa-z\.0-9\-\:\;\$]", " ");
commentFix = "\"" + commentFix + "\"";
Console.WriteLine("\n");
Console.WriteLine(commentFix);
Console.ReadLine();
}
}
}
The output of this is:
This is a test.
This is a newline.
this is another newline. This is a, comma
"This is a test.
This is a newline.
this is another newline. This is a comma"
Any ideas? (thanks, this is my first question!)
答案 0 :(得分:3)
The \s
matches a newline, and since it is inside a negated character class, line breaks are not removed.
See more details on what \s
matched at MSDN:
\f
- The form feed character,\u000C
.
\n
- The newline character,\u000A
.
\r
- The carriage return character,\u000D
.
\t
- The tab character,\u0009
.
\v
- The vertical tab character,\u000B
.
\x85
- The ellipsis or NEXT LINE (NEL) character (…),\u0085
.
\p{Z}
- Matches any separator character.
So, if you want to remove whitespace, just take out \s
(and I guess you need to replace multiple characters matched with one space, add +
that will match one or more occurrences of the pattern it quantifies):
[^A-Za-z.0-9:;$-]+
See the regex demo
Also note that you do not have to escape .
, :
, ;
and $
inside a character class, and you do not have to escape -
if it is at the beginning/end of the character class.
If you plan to match whitespace with the exception of CR and LF, use [^\S\r\n]
: [^A-Z\S\r\na-z.0-9:;$-]+
. Here, [^\S]
matches a whitespace, but \r\n
are inside the negated character class, so they are not matched.