我想从此字符串中提取文字
Name:
Franco Donezzi
Phone:
01234567890
Email:
franco@franco.com
Arrival date:
16/12/2014
Departure date:
28/12/2014
Guests:
2 adults, 0 children
Further info:
this is the text I want to match. there could be any amount of plain text here spread over multiple lines. sldkfjsldkfjs
我想提取'Franco Donezzi','0123457890','franco @ franco.com'等等
我一直能够使用reg-ex来反对html或使用simple-html-dom。通过匹配下一个冒号然后从匹配的字符串中移除相应的单词(例如电话),有一种hacky方法可以做到这一点,但是有更好的方法吗?
感谢
答案 0 :(得分:1)
只需打印带有前导空格的线条。例如,使用sed:
$ sed -n 's/^[[:space:]]\+//p' /tmp/corpus
Franco Donezzi
01234567890
franco@franco.com
16/12/2014
28/12/2014
2 adults, 0 children
this is the text I want to match. there could be any amount of plain text here spread over multiple lines. sldkfjsldkfjs
答案 1 :(得分:1)