Question

我有一个HTML文件，其中包含产品信息，包括其权重。我试图获得权重（任何数字在lbs之前）。偶尔在磅和重量之间有空格。我想出了正则表达式： preg_match(">[0-9]+(\.[0-9][0-9]?)(.*?)lbs/i",fgets($file),$matches);但这会在第一个'＆gt;'之间返回所有内容和'lbs'，因为涉及很多标签所以不实用。所以我想要完成的只是获得字符'＆gt;'之间的数字直接在重量之前和字符'lbs'之后的重量忽略了之间的空间。

所以在下面的例子中，我想得到0.94,0.12,0.94。任何帮助表示赞赏。

<td width="513" valign="top">0.94 lbs
<td width="513" valign="top">0.12lbs
<td width="513" valign="top">0.94LBS
<td width="513" valign="top">penguin lover

请注意，除了权重之外，标签“<td width="513" valign="top">”位于其他字符之前。

任何想法，帮助将不胜感激。

Answer 1

使用：

/(?<=>)[0-9]+(?:\.[0-9][0-9]?)(?=\s*lbs)/i

这使用了前瞻和后瞻，使得唯一匹配的是十进制数。

<强>解释

(?<=>) Lookbehind检查> - (?<=xxx)意味着了解xxx

[0-9]+(?:\.[0-9][0-9]?)使用非捕获组(?:xxx)

的未更改的十进制正则表达式

(?=\s*lbs)预测0-many空格字符后跟lbs

请注意，如果您愿意，可以将[0-9]替换为\d，它们是等效的。

示例代码：

$str = '<td width="513" valign="top">0.94 lbs
        <td width="513" valign="top">0.12lbs
        <td width="513" valign="top">0.94LBS
        <td width="513" valign="top">penguin lover';

preg_match_all("/(?<=>)[0-9]+(?:\.[0-9][0-9]?)(?=\s*lbs)/i",$str,$matches);

print_r($matches[0]);

<强>输出：

 Array ( [0] => 0.94 [1] => 0.12 [2] => 0.94 )

Answer 2

preg_match_all('/[0-9]+(?:\.[0-9]+)(?=\s*lbs)/i', $html, $matches);
print_r($matches[0]);

正则表达式：

[0-9]+         any character of: '0' to '9' (1 or more times)
(?:            group, but do not capture (optional)
  \.           '.' 
  [0-9]+       any character of: '0' to '9' (1 or more times)
)              end of grouping
 (?=           look ahead to see if there is:
  \s*          whitespace (\n, \r, \t, \f, and " ") (0 or more times)
  lbs          'lbs'
)              end of look-ahead

请参阅working demo

正则表达式匹配不同html标签内的重量（lbs）

2 个答案: