我正在尝试解析一些HTML内容,这是HTML内容:
<font color="green"> *TITLE* </font> Some Event Name 1:15pm-5:00pm <font color="gold">Stream 5</font><p>
<font color="green"> *TITLE* </font> Some: Event Name 1:30pm-5:00pm <font color="gold">Stream 4</font><p>
<font color="green"> *TITLE* </font> Some, Event Name 1 with num 1:30pm-7:30pm <font color="gold">CHANNEL TWO 2 STREAM http://http://domain.com/path/to/page-2-online.html</font><p>
<font color="green"> *TITLE* </font> Event two 2.45pm-4.45pm <font color="gold">Stream 16</font><p>
<font color="green"> *TITLE* </font> Event THREE summary 2.45pm-4.45pm <font color="gold">Stream 2</font><p>
<font color="green"> *TITLE* </font> Event with a lot of summary 4:00pm-6:00pm <font color="gold">CHANNEL THREE 3 STREAM http://domain.com/path/to/page-3-online.html</font><p>
所以要解析这个并获得“事件名称”,“事件时间”和“流号”,我这样做:
preg_match_all('/<\/font>\s*([^<]+)\s+(\d+.\d+\s*\w{2}\s*-\s*\d+.\d+\s*\w{2}).*?tream\s*(.*?)\s*<\/font><p>/', $data, $matches);
并且它正确地返回所有内容,但是也返回了带有http链接的流号,这是我不想要的。我只想要这个名字(对于某些人而言)&amp;仅限数字。
需要的数据:
5
4
CHANNEL TWO 2 STREAM
16
2
CHANNEL THREE 3 STREAM
目前它返回:
5
4
-online.html
16
2
-online.html
有人可以帮忙吗?在正则表达式中不是专业人士,过去2天一直在尝试。在此先感谢!!!
答案 0 :(得分:1)
但是,如果你想要它在正则表达式,然后根据你的数据,你需要这个
preg_match_all('/(?:<\/font> )((?:[^0-9]+(?:[0-9](?!\.|:|[0-9]))?(?:[0-9]{2}(?!\.|:))?)*)([^<]+) <[^>]+>(?:Stream )?([^h<]+)/', $data, $matches);
这会将名称放在$matches[1]
,$matches[2]
中的时间和$matches[3]
中的频道
正则表达式的解释:
(?:<\/font> )
搜索(并忽略)首先关闭新行上的字体标记,包括空格((?:[^0-9]+(?:[0-9](?!\.|:|[0-9]))?(?:[0-9]{2}(?!\.|:))?)*)
抓住所有不是一两个数字的东西,除非所说的数字后跟一个点或冒号(使用负向前瞻),根据需要重复并分组为一个([^<]+)
抓住所有内容到下一个“&lt;”,但不是尾随空格<[^>]+>
忽略每一个标记,直到下一个“&gt;”并忽略“&gt;”以及(?:Stream )?
如果第一个字是“流”,则忽略它([^h<]+)
抓住所有内容,直到小写“h”或“&lt;”答案 1 :(得分:0)
此表达式将:
Stream
<font(?=\s|>)(?=(?:[^>=|&)]*|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\scolor=['"]?gold['"]?)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>(?:Stream\s*)?\K(?:(?!\s*https?:|<\/font>).)*
Live Demo将鼠标悬停在蓝色区块上以查看匹配的原因
示例文字
<font color="green"> *TITLE* </font> Some Event Name 1:15pm-5:00pm <font color="gold">Stream 5</font><p>
<font color="green"> *TITLE* </font> Some: Event Name 1:30pm-5:00pm <font color="gold">Stream 4</font><p>
<font color="green"> *TITLE* </font> Some, Event Name 1 with num 1:30pm-7:30pm <font color="gold">CHANNEL TWO 2 STREAM http://http://domain.com/path/to/page-2-online.html</font><p>
<font color="green"> *TITLE* </font> Event two 2.45pm-4.45pm <font color="gold">Stream 16</font><p>
<font color="green"> *TITLE* </font> Event THREE summary 2.45pm-4.45pm <font color="gold">Stream 2</font><p>
<font color="green"> *TITLE* </font> Event with a lot of summary 4:00pm-6:00pm <font color="gold">CHANNEL THREE 3 STREAM http://domain.com/path/to/page-3-online.html</font><p>
<强>匹配强>
[0] => 5
[1] => 4
[2] => CHANNEL TWO 2 STREAM
[3] => 16
[4] => 2
[5] => CHANNEL THREE 3 STREAM
答案 2 :(得分:0)
此表达式将:
Stream
(如果存在) <font(?=\s|>)(?=(?:[^>=|&)]*|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\scolor=['"]?green['"]?)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>\s*(?:Stream\s*)?((?:(?!<\/font>).)*)<\/font>\s*[^<]*?([^<]+)\s+(\d+.\d+\s*\w{2}\s*-\s*\d+.\d+\s*\w{2})[^<]*?<font(?=\s|>)(?=(?:[^>=|&)]*|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\scolor=['"]?gold['"]?)(?:[^>=|&)]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>(?:Stream\s*)?((?:(?!\s*https?:|<\/font>).)*)
示例文字
组0获得整场比赛
第1组获得标题
第2组获得事件名称
第3组获得活动时间
第4组获得流编号
<font color="green"> *TITLE* </font> Some Event Name 1:15pm-5:00pm <font color="gold">Stream 5</font><p>
<font color="green"> *TITLE* </font> Some: Event Name 1:30pm-5:00pm <font color="gold">Stream 4</font><p>
<font color="green"> *TITLE* </font> Some, Event Name 1 with num 1:30pm-7:30pm <font color="gold">CHANNEL TWO 2 STREAM http://http://domain.com/path/to/page-2-online.html</font><p>
<font color="green"> *TITLE* </font> Event two 2.45pm-4.45pm <font color="gold">Stream 16</font><p>
<font color="green"> *TITLE* </font> Event THREE summary 2.45pm-4.45pm <font color="gold">Stream 2</font><p>
<font color="green"> *TITLE* </font> Event with a lot of summary 4:00pm-6:00pm <font color="gold">CHANNEL THREE 3 STREAM http://domain.com/path/to/page-3-online.html</font><p>
PHP代码示例
<?php
$sourcestring="your source string";
preg_match_all('/<font(?=\s|>)(?=(?:[^>=|&)]*|=\'[^\']*\'|="[^"]*"|=[^\'"][^\s>]*)*?\scolor=[\'"]?green[\'"]?)(?:[^>=|&)]|=\'(?:[^\']|\\')*\'|="(?:[^"]|\\")*"|=[^\'"][^\s>]*)*>\s*(?:Stream\s*)?((?:(?!<\/font>).)*)<\/font>\s*[^<]*?([^<]+)\s+(\d+.\d+\s*\w{2}\s*-\s*\d+.\d+\s*\w{2})[^<]*?<font(?=\s|>)(?=(?:[^>=|&)]*|=\'[^\']*\'|="[^"]*"|=[^\'"][^\s>]*)*?\scolor=[\'"]?gold[\'"]?)(?:[^>=|&)]|=\'(?:[^\']|\\')*\'|="(?:[^"]|\\")*"|=[^\'"][^\s>]*)*>(?:Stream\s*)?((?:(?!\s*https?:|<\/font>).)*)
/imsx',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>
<强>匹配强>
[0][0] = <font color="green"> *TITLE* </font> Some Event Name 1:15pm-5:00pm <font color="gold">Stream 5
[0][1] = *TITLE*
[0][2] = Some Event Name
[0][3] = 1:15pm-5:00pm
[0][4] = 5
[1][0] = <font color="green"> *TITLE* </font> Some: Event Name 1:30pm-5:00pm <font color="gold">Stream 4
[1][1] = *TITLE*
[1][2] = Some: Event Name
[1][3] = 1:30pm-5:00pm
[1][4] = 4
[2][0] = <font color="green"> *TITLE* </font> Some, Event Name 1 with num 1:30pm-7:30pm <font color="gold">CHANNEL TWO 2 STREAM
[2][1] = *TITLE*
[2][2] = Some, Event Name 1 with num
[2][3] = 1:30pm-7:30pm
[2][4] = CHANNEL TWO 2 STREAM
[3][0] = <font color="green"> *TITLE* </font> Event two 2.45pm-4.45pm <font color="gold">Stream 16
[3][1] = *TITLE*
[3][2] = Event two
[3][3] = 2.45pm-4.45pm
[3][4] = 16
[4][0] = <font color="green"> *TITLE* </font> Event THREE summary 2.45pm-4.45pm <font color="gold">Stream 2
[4][1] = *TITLE*
[4][2] = Event THREE summary
[4][3] = 2.45pm-4.45pm
[4][4] = 2
[5][0] = <font color="green"> *TITLE* </font> Event with a lot of summary 4:00pm-6:00pm <font color="gold">CHANNEL THREE 3 STREAM
[5][1] = *TITLE*
[5][2] = Event with a lot of summary
[5][3] = 4:00pm-6:00pm
[5][4] = CHANNEL THREE 3 STREAM