Question

我想从此字符串中提取文字

Name:
    Franco Donezzi  
Phone:
    01234567890
Email: 
    franco@franco.com

Arrival date:
    16/12/2014
Departure date:
    28/12/2014
Guests:
    2 adults, 0 children

Further info:
    this is the text I want to match. there could be any amount of plain text here spread over multiple lines. sldkfjsldkfjs

我想提取'Franco Donezzi'，'0123457890'，'franco @ franco.com'等等

我一直能够使用reg-ex来反对html或使用simple-html-dom。通过匹配下一个冒号然后从匹配的字符串中移除相应的单词（例如电话），有一种hacky方法可以做到这一点，但是有更好的方法吗？

感谢

Answer 1

Sed的示例

只需打印带有前导空格的线条。例如，使用sed：

$ sed -n 's/^[[:space:]]\+//p' /tmp/corpus 
Franco Donezzi  
01234567890
franco@franco.com
16/12/2014
28/12/2014
2 adults, 0 children
this is the text I want to match. there could be any amount of plain text here spread over multiple lines. sldkfjsldkfjs

Answer 2

查看此expression：

Name:\s*(.*?)

我们首先按字面匹配Name:，然后是0 +空白字符（\s*）。然后我们lazily capture 0+个字符（(.*?)）。

正则表达式再次匹配纯文本

2 个答案:

Sed的示例