将HTML中的图像名称提取到数组

时间:2016-05-27 14:05:25

标签: php

我创建了一些自定义的TinyMCE插件,允许用户将表情符号插入HTML中。

表情符号是SVG文件,存在于不同的“表情符号”目录中,其末尾带有数字1到9,具体取决于选择的表情符号类别。

文件名总是以“f”开头,但长度可以不同,有些不包含下划线,有些包含1,其他包含2或更多。

当用户提交内容时,我会通过HtmlPurifier来确保它是安全的。

然后将最终结果存储在数据库中。

我希望能够提取每个表单提交中包含的不同文件名的数组,以便我可以跟踪哪些文件很受欢迎。

例如,如果html看起来像这样:

<img src="https://example.com/assets/includes/tinymce/plugins/emoticons1/img/fe012.svg" alt="fe012" width="80" height="80" /> and something else and then 
<img src="https://example.com/assets/includes/tinymce/plugins/emoticons5/img/f001fg.svg" alt="f001fg" width="63" height="63" /> and we did this 
<img src="https://example.com/assets/includes/tinymce/plugins/emoticons2/img/f001_004.svg" alt="f001_004" width="122" height="122" /> 
<img src="https://example.com/assets/includes/tinymce/plugins/emoticons9/img/f3332.svg" alt="f3332" width="58" height="58" /> maybe something here 
<img src="https://example.com/assets/includes/tinymce/plugins/emoticons3/img/f5553_0001.svg" alt="f5553_0001" width="245" height="245" /> going onto 
<img src="https://example.com/assets/includes/tinymce/plugins/emoticons4/img/f002a_00d2_fee1.svg" alt="f002a_00d2_fee1" width="30" height="30" />  
<img src="https://example.com/assets/includes/tinymce/plugins/emoticons5/img/f3321_a.svg" alt="f3321_a" width="69" height="69" />

然后,如果我能找到一种方法将文件名列表提取到数组中,我可以这样做:

$files = array("fe012.svg","f001fg.svg", "f001_004.svg", "f3332.svg", "f5553_0001.svg", "f002a_00d2_fee1.svg", "f3321_a.svg");

for($x = 0; $x < count($files); $x++) {
    // update database to say image has been used
}

但事实是,这很复杂:

  1. 文件名长度不一定
  2. 图像的HTML与其他文本混合在一起
  3. 文件位于不同的目录中
  4. 文件有这个共同点:

    1. 名称始终以“f”
    2. 开头
    3. 他们都是.svg文件
    4. 更新1

      到目前为止,感谢您的建议,我已经做到了这一点:

      $html = "<img src=\"https://example.com/assets/includes/tinymce/plugins/emoticons1/img/fe012.svg\" alt=\"fe012\" width=\"80\" height=\"80\" /> and something else and then <img src=\"https://example.com/assets/includes/tinymce/plugins/emoticons5/img/f001fg.svg\" alt=\"f001fg\" width=\"63\" height=\"63\" /> and we did this <img src=\"https://example.com/assets/includes/tinymce/plugins/emoticons2/img/f001_004.svg\" alt=\"f001_004\" width=\"122\" height=\"122\" /> <img src=\"https://example.com/assets/includes/tinymce/plugins/emoticons9/img/f3332.svg\" alt=\"f3332\" width=\"58\" height=\"58\" /> maybe something here <img src=\"https://example.com/assets/includes/tinymce/plugins/emoticons3/img/f5553_0001.svg\" alt=\"f5553_0001\" width=\"245\" height=\"245\" /> going onto <img src=\"https://example.com/assets/includes/tinymce/plugins/emoticons4/img/f002a_00d2_fee1.svg\" alt=\"f002a_00d2_fee1\" width=\"30\" height=\"30\" /> <img src=\"https://example.com/assets/includes/tinymce/plugins/emoticons5/img/f3321_a.svg\" alt=\"f3321_a\" width=\"69\" height=\"69\" />";
      
      $dom = new DOMDocument();
      
      $dom->loadHTML($html);
      
      $images = $dom->getElementsByTagName('img');
      
      foreach($images as $img) {
      
      }
      

      几乎就是 - 但是我如何在foreach循环中访问图像名称?

1 个答案:

答案 0 :(得分:1)

您可以使用DOMDocument对象获取数组中的所有图像。然后,获取每个图像的src属性的基本文件名。

试试这个:

<?php
$str = <<<EOT
<img src="https://example.com/assets/includes/tinymce/plugins/emoticons1/img/fe012.svg" alt="fe012" width="80" height="80" /> and something else and then 
<img src="https://example.com/assets/includes/tinymce/plugins/emoticons5/img/f001fg.svg" alt="f001fg" width="63" height="63" /> and we did this 
<img src="https://example.com/assets/includes/tinymce/plugins/emoticons2/img/f001_004.svg" alt="f001_004" width="122" height="122" /> 
<img src="https://example.com/assets/includes/tinymce/plugins/emoticons9/img/f3332.svg" alt="f3332" width="58" height="58" /> maybe something here 
<img src="https://example.com/assets/includes/tinymce/plugins/emoticons3/img/f5553_0001.svg" alt="f5553_0001" width="245" height="245" /> going onto 
<img src="https://example.com/assets/includes/tinymce/plugins/emoticons4/img/f002a_00d2_fee1.svg" alt="f002a_00d2_fee1" width="30" height="30" />  
<img src="https://example.com/assets/includes/tinymce/plugins/emoticons5/img/f3321_a.svg" alt="f3321_a" width="69" height="69" />
EOT;

$doc = new DOMDocument();
$doc->loadHTML($str);
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
   echo basename($tag->getAttribute('src')) . PHP_EOL;
}

感谢Jocelyn,因为他建议让它更加优化