正则表达式提取特定行

时间:2013-01-09 04:34:21

标签: regex

我从工作流程软件中得到以下注释。需要从第一行和其他评论中提取一些部分。

以下是样本

Nelly Thomas (Approve) 12/27/2012 8:50 PM - 12/27/2012 8:52 PM
(Nelly Thomas) LazyApproval by nelly.thomas@joshworld.local Approved

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, 
when an unknown printer took a galley of type and scrambled it to make a type specimen book

现在需要像这样提取它。

Nelly Thomas 12/27/2012 8:50 PM

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book

我需要一个正则表达式来实现这一点。

1 个答案:

答案 0 :(得分:0)

好的,你走了:

var s = "Nelly Thomas (Approve) 12/27/2012 8:50 PM - 12/27/2012 8:52 PM\n\
(Nelly Thomas) LazyApproval by nelly.thomas@joshworld.local Approved\n\
\n\
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, \n\
when an unknown printer took a galley of type and scrambled it to make a type specimen book";
s.replace(/(.+)\(.+\)\s((\d\d\/){2}\d{4}\s\d{1,2}:\d\d\s\w\w)\s-\s.+[\n|\r].+[\n|\r]{2}([^]+)/gi, '$1$2\n\n$4');

//Result: 
"Nelly Thomas 12/27/2012 8:50 PM

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, 
when an unknown printer took a galley of type and scrambled it to make a type specimen book"

这是一个有效的正则表达式,但它并不是特别漂亮:

# /                   --> Regex start:
# (.+)                --> a word (group #1)
# \(.+\)\s            --> followed by a word in () and a space.
# ((\d\d\/){2}\d{4}\s --> followed by a date and
# \d{1,2}:\d\d\s\w\w) --> time (group #2)
# \s-\s               --> followed by ` - `
# .+                  --> followed by any number of letters or spaces. (The 2nd date)
# [\n|\r]             --> followed by a newline.
# .+                  --> followed by any number of letters or spaces. (The 2nd line)
# [\n|\r]{2}          --> followed by 2 newlines.
# ([^]+)              --> followed by _any_ character, including newlines (group 4)
# /gi                 --> Regex end, (g)lobal flag, case (i)nsensitive flag.

然后,输出组124,并在24之间加上一个换行符。

所以,它很难看,但只要文本遵循以下格式,它就可以工作:

  

W(W)DD \ DD \ DDDD D(?D):DD LL - W
  w ^
  
  w ^

D是一位数字,L是一个字母,W是任意数量的字词和空格,不包括换行符,D(?D)表示一位或两位数字。