正则表达式再次匹配纯文本

时间:2014-04-28 23:05:59

标签: regex

我想从此字符串中提取文字

Name:
    Franco Donezzi  
Phone:
    01234567890
Email: 
    franco@franco.com

Arrival date:
    16/12/2014
Departure date:
    28/12/2014
Guests:
    2 adults, 0 children

Further info:
    this is the text I want to match. there could be any amount of plain text here spread over multiple lines. sldkfjsldkfjs

我想提取'Franco Donezzi','0123457890','franco @ franco.com'等等

我一直能够使用reg-ex来反对html或使用simple-html-dom。通过匹配下一个冒号然后从匹配的字符串中移除相应的单词(例如电话),有一种hacky方法可以做到这一点,但是有更好的方法吗?

感谢

2 个答案:

答案 0 :(得分:1)

Sed的示例

只需打印带有前导空格的线条。例如,使用sed:

$ sed -n 's/^[[:space:]]\+//p' /tmp/corpus 
Franco Donezzi  
01234567890
franco@franco.com
16/12/2014
28/12/2014
2 adults, 0 children
this is the text I want to match. there could be any amount of plain text here spread over multiple lines. sldkfjsldkfjs

答案 1 :(得分:1)

查看此expression

Name:\s*(.*?)

我们首先按字面匹配Name:,然后是0 +空白字符(\s*)。然后我们lazily capture 0+个字符((.*?))。