symfony爬虫在李里面获取文本

时间:2018-10-30 12:16:51

标签: domcrawler

我正在尝试在项目中使用Symfony爬网程序来爬网一些HTML。像这样的HTML代码:

<ul>
    <li>
        <strong>Online:</strong> 10/27/2018 10:44 PM
        <strong>
            &nbsp; 2 days ago
        </strong>
    </li>
    <li>
        <strong>Hearing Impaired:</strong> No
    </li>
    <li>
        <strong>Foreign parts:</strong> Yes </li>
    <li>
        <strong>Framerate:</strong> Not available
    </li>
    <li>
        <strong>Files:</strong> 2 (12,542 bytes)
    </li>
    <li>
        <strong>Production type:</strong> Translated a subtitle
    </li>
    <li>
        <strong>Release type:</strong> Web
    </li>


    <li>
        ---------------------------------------
    </li>
    <li itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">
        <strong>Rated:</strong>
        <span itemprop="ratingValue">10</span>/<span itemprop="bestRating">10</span> from
        <a href="/subtitles/alpha-2018/farsi_persian/1870137/ratings" title="View Ratings">
            <span itemprop="ratingCount">3</span> users
        </a>
    </li>
    <li>
        <strong>Voted as Good by:</strong> 3 users
    </li>
    <li>
        <strong>Downloads:</strong> 310
    </li>
</ul>

我想将此HTML转换为这样的关联数组

[
    'Online'           => '10/27/2018 10:44 PM' ,
    'Hearing Impaired' => 'No' ,
    'Foreign parts'    => 'Yes' ,
    'Framerate'        => 'Not available' ,
    'Files'            => '2 (12,542 bytes)' ,
    'Production type'  => 'Translated a subtitle' ,
    'Release type'     => 'Web' ,
    'Voted as Good by' => '3 users' ,
    'Downloads'        => '310' ,
]

我尝试过这是我带出代码的距离。

 $details->filter('li')->each(function ( Crawler $node ) {
                return $node->html();
            }) ,

但是它给了我一系列带有所有HTML的li标签。像这样

\r\n
  \t\t\t\t\t\t\t\t<strong>Online:</strong>\r\n
  \t\t\t\t\t\t\t\t10/27/2018 10:44 PM\r\n
  \t\t\t\t\t\t\t\t<strong>\r\n
  \t\t\t\t\t\t\t\t\t  2 days ago\r\n
  \t\t\t\t\t\t\t\t</strong>\r\n
  \t\t\t\t\t\t\t

我不知道所有这些\t来自何处。 HTML似乎比这更干净。

0 个答案:

没有答案