忽略正则表达式中的回车

时间:2016-08-22 20:28:27

标签: javascript regex

我目前正在尝试使用Javascript解析对话文件。这是一个这样的对话的例子。

09/05/2016, 13:11 - Joe Bloggs: Hey Jane how're you doing?  what dates are you in London again? I realise that June isn't actually that far away so might book my trains down sooner than later!
09/05/2016, 13:47 - Jane Doe: Hey! I'm in london from the 12th-16th of june! Hope you can make it down :) sorry it's a bit annoying i couldn't make it there til a sunday!
09/05/2016, 14:03 - Joe Bloggs: Right I'll speak to my boss! I've just requested 5 weeks off in November/December to visit Aus so I'll see if I can negotiate some other days!

When does your uni term end in November? I'm thinking of visiting perth first then going to the east coast!
09/05/2016, 22:32 - Jane Doe: Oh that'll be awesome if you come to aus! Totally understand if it's too hard for you to request more days off in june. 

I finish uni early November! So should definitely be done by then if you came here
09/05/2016, 23:20 - Joe Bloggs: I could maybe get a couple of days  when do you fly into London on the Sunday?

Perfect! I need to speak to everyone else to make sure they're about. I can't wait to visit but it's so far away!
09/05/2016, 23:30 - Jane Doe: I fly in at like 7.30am so I'll have that whole day!

I'm sure the year will fly since it's may already haha
09/05/2016, 23:34 - Joe Bloggs: Aw nice one! Even if I can get just Monday off I can get an early train on Sunday 

我当前的正则表达式如下所示

(\d{2}\/\d{2}\/\d{4}),\s(\d(?:\d)?:\d{2})\s-\s([^:]*):\s(.*?)(?=\s*\d{2}\/|$)/gm

我的方法几乎就在那里,按预期给了我4组

{
    "group": 1,
    "value": "09/05/2016"
  },
  {
    "group": 2,
    "value": "13:11"
  },
  {
    "group": 3,
    "value": "Joe Bloggs"
  },
  {
    "group": 4,
    "value": "Hey Jane how're you doing?  what dates are you in London again? I realise that June isn't actually that far away so might book my trains down sooner than later!"
  }

当消息(组4)包含回车符时出现问题。 (请参阅示例代码段中第3行的消息)。

我做了一些研究并且使用

[\s\S]
并没有解决我的问题。该模式只是停止并移动到下一次出现。

对于第三个对话,消息在回车时被切断。

DEMO

任何帮助将不胜感激!

1 个答案:

答案 0 :(得分:2)

尝试

(\d{2}\/\d{2}\/\d{4}),\s(\d{1,2}:\d{2})\s-\s([^:]*):\s+(.*(?:\n+(?!\n|\d{2}\/).*)*)

https://regex101.com/r/sA3sB8/2)扫描到行尾,然后使用重复的组首先检查新行是否以\d\d/开头(这是日期的开始)下一行(如果没有),也可以捕获整行。

如果您担心两个数字后跟正斜杠可能会遇到任何边缘情况,您可以使负面预测更具体一些。它增加了步数,但会使它更安全。

如果用户实际输入了换行符后跟该语法中的日期,则可能会遇到问题,因为它会在该点停止匹配。我怀疑它们还会包含一个逗号和一个24小时的时间,所以这可能是处理这种情况的一种方法。

示例:

09/05/2016, 23:36 - Jane Doe: Great! Let me give you my travel details:

10/01/2016 @ 6am - Arrive at the station
10/01/2016 @ 7am - Get run over by a drunk horse carriage (the driver and the horse were both sober; the carriage stayed up a bit late to drink)
10/01/2016 @ 7:15am - Pull myself out from under the carriage and kick at its wheels vehemently.

09/05/2016, 23:40 - Joe Bloggs: Haha, sounds great.

这只是一个示例(corresponding fix为处理它的前瞻添加更多细节)只是为了显示用户可能添加可能破坏该特定内容的文本修正正则表达式。