我正在尝试在项目中使用Symfony爬网程序来爬网一些HTML。像这样的HTML代码:
<ul>
<li>
<strong>Online:</strong> 10/27/2018 10:44 PM
<strong>
2 days ago
</strong>
</li>
<li>
<strong>Hearing Impaired:</strong> No
</li>
<li>
<strong>Foreign parts:</strong> Yes </li>
<li>
<strong>Framerate:</strong> Not available
</li>
<li>
<strong>Files:</strong> 2 (12,542 bytes)
</li>
<li>
<strong>Production type:</strong> Translated a subtitle
</li>
<li>
<strong>Release type:</strong> Web
</li>
<li>
---------------------------------------
</li>
<li itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">
<strong>Rated:</strong>
<span itemprop="ratingValue">10</span>/<span itemprop="bestRating">10</span> from
<a href="/subtitles/alpha-2018/farsi_persian/1870137/ratings" title="View Ratings">
<span itemprop="ratingCount">3</span> users
</a>
</li>
<li>
<strong>Voted as Good by:</strong> 3 users
</li>
<li>
<strong>Downloads:</strong> 310
</li>
</ul>
我想将此HTML转换为这样的关联数组
[
'Online' => '10/27/2018 10:44 PM' ,
'Hearing Impaired' => 'No' ,
'Foreign parts' => 'Yes' ,
'Framerate' => 'Not available' ,
'Files' => '2 (12,542 bytes)' ,
'Production type' => 'Translated a subtitle' ,
'Release type' => 'Web' ,
'Voted as Good by' => '3 users' ,
'Downloads' => '310' ,
]
我尝试过这是我带出代码的距离。
$details->filter('li')->each(function ( Crawler $node ) {
return $node->html();
}) ,
但是它给了我一系列带有所有HTML的li标签。像这样
\r\n
\t\t\t\t\t\t\t\t<strong>Online:</strong>\r\n
\t\t\t\t\t\t\t\t10/27/2018 10:44 PM\r\n
\t\t\t\t\t\t\t\t<strong>\r\n
\t\t\t\t\t\t\t\t\t 2 days ago\r\n
\t\t\t\t\t\t\t\t</strong>\r\n
\t\t\t\t\t\t\t
我不知道所有这些\t
来自何处。 HTML似乎比这更干净。