通过分隔符拆分而不从字符串中删除它

时间:2014-05-12 18:16:59

标签: c# regex string-split

我想使用正则表达式将长字符串拆分为分隔线。 Line可以包含任何可能的unicode字符。 线在点(“。” - 一个或多个)或新线(“\ n”)上“结束”。

示例:

此字符串将是输入:

"line1. line2.. line3... line4.... line5..... line6
\n
line7"

输出:

  • “LINE1。”
  • “LINE2 ..”
  • “line3中...”
  • “LINE4 ......”
  • “LINE5 .....”
  • “line6”
  • “line7”

3 个答案:

答案 0 :(得分:1)

试试这个:

String result = Regex.Replace(subject, @"""?(\w+([.]+)?)(?:[\n ]|[""\n]$)+", @"""$1""\n");

/*
"line1."
"line2.."
"line3..."
"line4...."
"line5....."
"line6"
"line7"
*/

正则表达式解释

"?(\w+([.]+)?)(?:[\n ]|["\n]$)+

Match the character “"” literally «"?»
   Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the regular expression below and capture its match into backreference number 1 «(\w+([.]+)?)»
   Match a single character that is a “word character” (letters, digits, and underscores) «\w+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
   Match the regular expression below and capture its match into backreference number 2 «([.]+)?»
      Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
      Match the character “.” «[.]+»
         Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below «(?:[\n ]|["\n]$)+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
   Match either the regular expression below (attempting the next alternative only if this one fails) «[\n ]»
      Match a single character present in the list below «[\n ]»
         A line feed character «\n»
         The character “ ” « »
   Or match regular expression number 2 below (the entire group fails if this one fails to match) «["\n]$»
      Match a single character present in the list below «["\n]»
         The character “"” «"»
         A line feed character «\n»
      Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

答案 1 :(得分:1)

如果我理解你要求的内容,你可以尝试这样的模式:

(?<=\.)(?!\.)|\n

这会将字符串拆分在前面有.但后面没有. 一个\n字符的任何位置。

请注意,此模式会保留点后的任何空格,例如:

var input = @"line1. line2.. line3... line4.... line5..... line6\nline7";
var output = Regex.Split(input, @"(?<=\.)(?!\.)|\n");

可生产

line1. 
 line2.. 
 line3... 
 line4.... 
 line5..... 
 line6 
line7 

如果您想摆脱空白,只需将其更改为:

(?<=\.)(?!\.)\s*|\n

但是如果你知道点总是后跟空格,你可以简化为:

(?<=\.)\s+|\n

答案 2 :(得分:0)

如果你想保持所有的点完好无损,点后面会有一个空格,那么这可能是你的正则表达式:

String result = Regex.Replace(t, @".\s", @".\n");

这将是一个字符串。如果你想要更多的字符串或结果,你还没有说明。