I need to scrape the number values that has "CR" on this website such as:
http://webapps.nyc.gov:8084/cics/f704/f403001i?BBL=1-00259-0071
Unfortunately, I cannot find a solution to this using the DomCrawler filter method
http://symfony.com/doc/current/components/dom_crawler.html
Any experienced Symfony users can help me? Or give me any advice
This is what I have using the xpath method:
$crawler->filterXPath('//div/center/table/tbody/tr/td[contains(., 'CR')]')->text()
Update I managed to grab all the CR's using:
//td/font[contains(., 'CR')]
But what i need are the numbers
Thank you
答案 0 :(得分:2)
爬虫类似于SimpleXML和jQuery。如果您不熟悉它们,那么您很难弄清楚如何获取内容。您无需明确使用xpath
来获取内容。你可以用filter
(类似于jQuery,即filter('body > .my_class')
$url = '...';
$crawler = new Crawler(file_get_contents($url));
$crawler->filterXPath("//td/font[contains(., ' CR')]")->each(function(Crawler $node, $i){
$string = filter_var($node->parents()->first()->text(), FILTER_SANITIZE_URL);
$string = str_replace('CR', ' CR', $string);
var_dump($string);
});