从img中提取data-src和data-srcset

时间:2018-12-04 15:12:42

标签: php regex

我正在尝试从php中许多图像的字符串中获取<!-- begin:: Page --> <div class="m-grid m-grid--hor m-grid--root m-page"> <m-header *ngIf="showContent"></m-header> <!-- begin::Body --> <div class="m-grid__item m-grid__item--fluid m-grid m-grid--ver-desktop m-grid--desktop m-body"> <m-leftmenu></m-leftmenu> <div class="m-grid__item m-grid__item--fluid m-wrapper"> <!-- BEGIN: Subheader --> <div class="m-subheader "> <div class="d-flex align-items-center"> <div class="mr-auto"> <h3 class="m-subheader__title ">Painel</h3> </div> <div> <span class="m-subheader__daterange" id="m_dashboard_daterangepicker"> <span class="m-subheader__daterange-label"> <span class="m-subheader__daterange-title"></span> <span class="m-subheader__daterange-date m--font-brand"></span> </span> <a href="#" class="btn btn-sm btn-brand m-btn m-btn--icon m-btn--icon-only m-btn--custom m-btn--pill"> <i class="la la-angle-down"></i> </a> </span> </div> </div> </div> <!-- END: Subheader --> <div class="m-content"> <!--Begin::Section--> <div class="row"> <div class="col-xl-12"> </div> <div class="col-xl-12 col-lg-12"> <!--begin:: Widgets/Quick Stats--> <div class="row m-row--full-height"> <div class="col-sm-12 col-md-12 col-lg-6"> <div class="m-portlet m-portlet--half-height m-portlet--border-bottom-brand "> <div class="m-portlet__body"> <div class="m-widget26"> <div class="m-widget26__number"> 570 <small>All Sales</small> </div> <div class="m-widget26__chart" style="height:90px; width: 220px;"> <canvas id="m_chart_quick_stats_1"></canvas> </div> </div> </div> </div> <div class="m--space-30"></div> <div class="m-portlet m-portlet--half-height m-portlet--border-bottom-danger "> <div class="m-portlet__body"> <div class="m-widget26"> <div class="m-widget26__number"> 690 <small>All Orders</small> </div> <div class="m-widget26__chart" style="height:90px; width: 220px;"> <canvas id="m_chart_quick_stats_2"></canvas> </div> </div> </div> </div> </div> <div class="col-sm-12 col-md-12 col-lg-6"> <div class="m-portlet m-portlet--half-height m-portlet--border-bottom-success "> <div class="m-portlet__body"> <div class="m-widget26"> <div class="m-widget26__number"> 230 <small>All Transactions</small> </div> <div class="m-widget26__chart" style="height:90px; width: 220px;"> <canvas id="m_chart_quick_stats_3"></canvas> </div> </div> </div> </div> <div class="m--space-30"></div> <div class="m-portlet m-portlet--half-height m-portlet--border-bottom-accent "> <div class="m-portlet__body"> <div class="m-widget26"> <div class="m-widget26__number"> 470 <small>All Comissions</small> </div> <div class="m-widget26__chart" style="height:90px; width: 220px;"> <canvas id="m_chart_quick_stats_4"></canvas> </div> </div> </div> </div> </div> </div> <!--end:: Widgets/Quick Stats--> </div> </div> <!--End::Section--> </div> </div> </div> <m-footer></m-footer> </div> <!-- end:: Page --> <!-- begin::Scroll Top --> <div id="m_scroll_top" class="m-scroll-top"> <i class="la la-arrow-up"></i> </div> <!-- end::Scroll Top --> data-src属性。这两个属性都是可选的,这意味着可以为零,只能为data-srcset,只能为data-src,或者两者都为。我拥有的正则表达式是

data-srcset

我要测试的字符串是:

<img(.*?)data-src=['\"](.*?)['\"].*?|(data-srcset=['\"](.*?)['\"])?\/>

但是太贪心了。看这里:

https://regex101.com/r/vDQE3C/1

非常感谢任何帮助(也是合乎逻辑的)。

2 个答案:

答案 0 :(得分:1)

不要使用正则表达式来解析html代码。最好像这样使用DOM解析器:

$html = <<< EOF
<li class="blocks-gallery-item">
  <figure>
    <img data-src="http://localhost:3000/wp-content/uploads/2018/11/detektivhut.gif" alt="" data-id="1037" data-link="http://localhost:3000/detektivhut/" class="wp-image-1037"/>
  </figure>
</li>
<li class="blocks-gallery-item">
  <figure>
    <img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04828.png" alt="" data-id="948" data-link="http://localhost:3000/dsc04828-2/" class="wp-image-948" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04828.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
  </figure>
</li>
<li class="blocks-gallery-item">
  <figure>
    <img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04831.png" alt="" data-id="883" data-link="http://localhost:3000/2018/11/13/single-page-style-1/dsc04831-2/" class="wp-image-883" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04831.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
  </figure>
</li>
EOF;

$xpath = new DOMXPath(@DOMDocument::loadHTML($html));
$images = $xpath->evaluate("//img");

foreach($images as $img){
   if (($el = $img->attributes->getNamedItem('data-src')) != null)
      echo 'data-src=' . $el->nodeValue . "\n";
   if (($el = $img->attributes->getNamedItem('data-srcset')) != null)
      echo 'data-srcset=' . $el->nodeValue . "\n";
}

输出:

data-src=http://localhost:3000/wp-content/uploads/2018/11/detektivhut.gif
data-src=http://localhost:3000/wp-content/uploads/2018/11/DSC04828.png
data-srcset=//localhost:3000/wp-content/uploads/2018/11/DSC04828.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-1000x1500.png 1000w
data-src=http://localhost:3000/wp-content/uploads/2018/11/DSC04831.png
data-srcset=//localhost:3000/wp-content/uploads/2018/11/DSC04831.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-1000x1500.png 1000w

答案 1 :(得分:0)

您只需要考虑data-attributes*和图像关闭标签/>之间的所有内容。您需要另一个(.*?)

<img(.*?)data-src=['\"](.*?)['\"].*?data-srcset=['\"](.*?)['\"](.*?)\/>

如果只想捕获data-attributes*,请考虑使用非捕获组,如下所示。这样$1$2变量仅包含所需的数据,而不包含整个图像标签。

<img(?:.*?)data-src=['\"](.*?)['\"].*?data-srcset=['\"](.*?)['\"](?:.*?)\/>