Scrape website Data USING Symfony's DomCrawler Component

时间:2015-09-14 15:57:02

标签: php symfony

I need to scrape the number values that has "CR" on this website such as:

http://webapps.nyc.gov:8084/cics/f704/f403001i?BBL=1-00259-0071

Unfortunately, I cannot find a solution to this using the DomCrawler filter method

http://symfony.com/doc/current/components/dom_crawler.html

Any experienced Symfony users can help me? Or give me any advice

This is what I have using the xpath method:

 $crawler->filterXPath('//div/center/table/tbody/tr/td[contains(., 'CR')]')->text()

Update I managed to grab all the CR's using:

//td/font[contains(., 'CR')]

But what i need are the numbers

Thank you

1 个答案:

答案 0 :(得分:2)

爬虫类似于SimpleXML和jQuery。如果您不熟悉它们,那么您很难弄清楚如何获取内容。您无需明确使用xpath来获取内容。你可以用filter(类似于jQuery,即filter('body > .my_class')

来做到这一点
$url = '...';

$crawler = new Crawler(file_get_contents($url));

$crawler->filterXPath("//td/font[contains(., ' CR')]")->each(function(Crawler $node, $i){
    $string = filter_var($node->parents()->first()->text(), FILTER_SANITIZE_URL);
    $string = str_replace('CR', ' CR', $string);
    var_dump($string);
});