Question

我需要一些帮助。我得到了学校的作业，我需要制作一个正常的表达式来获取图像（然后上传到数据库，但这不是问题）。真正的问题是我得到一个包含页面中所有图像的数组，但应该是一个图像，它是： data-src-l="/WebRoot/products/8020/80203122/bilder/80203122.jpg" 这是整个图像的代码：

  <li>
    <a href="/WebRoot/products/8020/80203122/bilder/80203122.jpg">
      <img
       itemprop="image"
       alt="Jesus Remember Me - Taize Songs (2CD)"
       src="/WebRoot/AsaphNL/Shops/asaphnl/5422/8F43/62EE/D698/EF8E/4DEB/AED5/3B0E/80203122_xs.jpg"
       data-src-xs="/WebRoot/AsaphNL/Shops/asaphnl/5422/8F43/62EE/D698/EF8E/4DEB/AED5/3B0E/80203122_xs.jpg"
       data-src-s="/WebRoot/products/8020/80203122/bilder/80203122_s.jpg"

       data-src-m="/WebRoot/products/8020/80203122/bilder/80203122_m.jpg"

       data-src-l="/WebRoot/products/8020/80203122/bilder/80203122.jpg"
     />
    </a>
  </li>

</ul>

这是PHP的代码：

<?php
header('Content-Type: text/html; charset=utf-8');
$url = "http://www.asaphshop.nl/epages/asaphnl.sf/nl_NL/?ObjectPath=/Shops/asaphnl/Products/80203122";
$htmlcode = file_get_contents($url);
$pattern = "/<img\s[^>]*?src\s*=\s*['\"]([^'\"]*?)['\"][^>]*?>/";
preg_match_all($pattern, $htmlcode, $matches);
//print_r ($matches);
$image = ($matches[0]);
$image = str_replace('src="/', 'src="http://www.asaphshop.nl/', $image);
print_r ($image);
?>

更新：在imagelink前面必须是http://www.asaphshop.nl的链接，因此它会查看网站上的图片。不在我的localhost里面。如果你不理解我，你可以问;）

Answer 1

(<img\s[^>]*?data-src-l\s*=\s*['\"])([^'\"]*?['\"])([^>]*?>)

试试这个。这将提供所需的img。替换为$1http://www.asaphshop.nl$2$3。请参阅演示。

http://regex101.com/r/wQ1oW3/29

Answer 2

我需要一些帮助。我得到了学校的作业，我需要制作一个正则表达式脚本来获取图像（然后上传到数据库，但这不是问题）。

告诉你的学校正则表达式是not the best tool for the job。

当然，这个论点是regular expressions are not so regular，可以用于回文匹配等任务。但这并不意味着你应该使用它们，因为它会给你和其他可能需要稍后使用你的代码的开发人员带来很多麻烦。

您应该使用的是适当的HTML / XML解析器。

幸运的是，PHP拥有它所需要的东西，它被称为DOMDocument。例如，查看其getElementsByTagName方法。您可以使用它来检索图像。然后你可以遍历所有属性并按照你想要的方式解析它们。

不仅更安全，因为您不必担心边缘情况，它也更具可读性。

错误的图像，正则表达式

2 个答案: