RegEx按空格将字符串拆分为单词并包含字符

时间:2018-02-13 12:15:43

标签: c# regex

如何使用Regex.Split(input, pattern)方法执行此拆分?

This is a [normal string ] made up of # different types # of characters

字符串输出数组:

1. This 
2. is
3. a
4. [normal string ]
5. made
6. up
7. of
8. # different types #
9. of
10. characters

此外它应该保留领先的空间,所以我想保留一切。一个字符串包含20个字符,字符串数组应该在所有元素上共20个字符。

我尝试过:

Regex.Split(text, @"(?<=[ ]|# #)")

Regex.Split(text, @"(?<=[ ])(?<=# #")

3 个答案:

答案 0 :(得分:2)

我建议匹配,即提取字词,而不是拆分

string source = @"This is a [normal string ] made up of # different types # of characters";

// Three possibilities:
//   - plain word [A-Za-z]+
//   - # ... # quotation
//   - [ ... ] quotation  
string pattern = @"[A-Za-z]+|(#.*?#)|(\[.*?\])";

var words = Regex
  .Matches(source, pattern)
  .OfType<Match>()
  .Select(match => match.Value)
  .ToArray();

Console.WriteLine(string.Join(Environment.NewLine, words
  .Select((w, i) => $"{i + 1}. {w}")));

结果:

1. This
2. is
3. a
4. [normal string ]
5. made
6. up
7. of
8. # different types #
9. of
10. characters

答案 1 :(得分:1)

您可以使用

var res = Regex.Split(s, @"(\[[^][]*]|#[^#]*#)|\s+")
    .Where(x => !string.IsNullOrEmpty(x));

请参阅regex demo

(\[[^][]*]|#[^#]*#)部分是一个捕获组,其值与拆分项一起输出到结果列表。

模式详情

  • (\[[^][]*]|#[^#]*#) - 第1组:两种模式中的任何一种:
    • \[[^][]*] - [,其次是除[]以外的0 +字符,然后是]
    • #[^#]*# - #,然后是#以外的0 +字符,然后是#
  • | - 或
  • \s+ - 1+空格

C# demo

var s = "This is a [normal string ] made up of # different types # of characters";
var results = Regex.Split(s, @"(\[[^][]*]|#[^#]*#)|\s+")
    .Where(x => !string.IsNullOrEmpty(x));
Console.WriteLine(string.Join("\n", results));

结果:

This
is
a
[normal string ]
made
up
of
# different types #
of
characters

答案 2 :(得分:0)

使用匹配方法会更容易,但使用负lookeaheads it can be done

[ ](?![^\]\[]*\])(?![^#]*\#([^#]*\#{2})*[^#]*$)

匹配未跟随

的空格
  • []后跟]
  • 的任何字符序列
  • #后跟偶数#