如何使用PHP ad xpath在HTML页面中获取字符串(POST请求?)

时间:2017-12-19 19:43:10

标签: php xpath web-scraping

我试图抓住这个网页......

https://www.aslteramo.it/SISWebOnLine/ProntoSoccorso.aspx

s

....使用PHP和XPath获取红色,黄色,绿色和白色圆圈下的数字值。

(注意:如果您尝试浏览它,您可以在该页面中看到不同的值...它并不重要......它会改变它的恐怖......)

我尝试使用此PHP代码示例来打印值...

<?php
    ini_set('display_errors', 'On');
    error_reporting(E_ALL);

    $url = 'http://www.aslteramo.it/SISWebOnLine/ProntoSoccorso.aspx';

    $xpath_for_parsing = '/html/body/div/form/div[3]/div[2]/div[3]/div/div/div[2]/table/tbody/tr[2]/td[4]/table/tbody/tr[1]/td';


    //#Set CURL parameters: pay attention to the PROXY config !!!!
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_PROXY, '');
    $data = curl_exec($ch);
    curl_close($ch);

    $dom = new DOMDocument();
    @$dom->loadHTML($data);

    $xpath = new DOMXPath($dom);

    $colorWaitingNumber = $xpath->query($xpath_for_parsing);
    $theValue =  'N.D.';
    foreach( $colorWaitingNumber as $node )
    {
      $theValue = $node->nodeValue;
    }

    print $theValue;
?>

请注意,要获取元素XPath,您必须在浏览器中禁用javascript,因为鼠标右键单击已禁用。

我在页面中看到有一个POST请求...

enter image description here

....但我不知道如何修改我的代码来执行请求,然后知道如何提取我的值...

任何帮助将不胜感激。

提前谢谢

1 个答案:

答案 0 :(得分:1)

  

我在页面中看到有一个POST请求...

您无法获取数据,即POST请求是在页面加载时获取的。您需要执行相同的POST请求:

$curl = curl_init();

curl_setopt_array($curl, array(
  CURLOPT_URL => "https://www.aslteramo.it/SISWebOnLine/ProntoSoccorso.aspx",
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => "",
  CURLOPT_MAXREDIRS => 10,
  CURLOPT_TIMEOUT => 30,
  CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
  CURLOPT_CUSTOMREQUEST => "POST",
  // this is to emulate the page behavior
  CURLOPT_POSTFIELDS => "ctl00%24ScriptManager1=ctl00%24MainContent%24UpdatePanel1%7Cctl00%24MainContent%24Timer1&__EVENTTARGET=ctl00%24MainContent%24Timer1&__EVENTARGUMENT=&__VIEWSTATE=%2FwEPDwUKLTYxOTg2MDY2NA9kFgJmD2QWAgIDD2QWBgIDDzwrAA0CAA8WAh4LXyFEYXRhQm91bmRnZAwUKwAGBRMwOjAsMDoxLDA6MiwwOjMsMDo0FCsAAhYQHgRUZXh0BQ1Ib21lIHBhZ2UgQVNMHgVWYWx1ZQUNSG9tZSBwYWdlIEFTTB4LTmF2aWdhdGVVcmwFF2h0dHA6Ly93d3cuYXNsdGVyYW1vLml0HgdUb29sVGlwBRxQYWdpbmEgaW5pemlhbGUgZGVsIHNpdG8gQVNMHgdFbmFibGVkZx4KU2VsZWN0YWJsZWceCERhdGFQYXRoBRdodHRwOi8vd3d3LmFzbHRlcmFtby5pdB4JRGF0YUJvdW5kZ2QUKwACFhIfBWcfBmcfCGcfBwUhL3Npc3dlYm9ubGluZS9wcm9udG9zb2Njb3Jzby5hc3B4HwEFD1Byb250byBTb2Njb3Jzbx8CBQ9Qcm9udG8gU29jY29yc28fBAUeVGVtcGkgZCdhdHRlc2EgUHJvbnRvIFNvY2NvcnNvHghTZWxlY3RlZGcfAwUhL1NJU1dlYk9uTGluZS9Qcm9udG9Tb2Njb3Jzby5hc3B4ZBQrAAIWEB8BBQ5UZW1waSBkJ2F0dGVzYR8CBQ5UZW1waSBkJ2F0dGVzYR8DBSAvU0lTV2ViT25MaW5lL1RlbXBpRGlhdHRlc2EuYXNweB8EBShUZW1waSBkJ2F0dGVzYSBwcmVzdGF6aW9uaSBhbWJ1bGF0b3JpYWxpHwVnHwZnHwcFIC9zaXN3ZWJvbmxpbmUvdGVtcGlkaWF0dGVzYS5hc3B4HwhnZBQrAAIWEB8BBRZMaXN0YSBkJ0F0dGVzYSBFeC1Qb3N0HwIFFkxpc3RhIGQnQXR0ZXNhIEV4LVBvc3QfAwUpamF2YXNjcmlwdDpvcGVuV2ViRm9ybSgnV2ViRXhQb3N0LmFzcHgnKTsfBAUnTW9uaXRvcmFnZ2lvIExpc3RhIGQnQXR0ZXNhIC0gKEV4LVBvc3QpHwVnHwZnHwcFKWphdmFzY3JpcHQ6b3BlbndlYmZvcm0oJ3dlYmV4cG9zdC5hc3B4Jyk7HwhnZBQrAAIWEB8BBR5BdHRpdml0w6AgbGliZXJvLXByb2Zlc3Npb25hbGUfAgUeQXR0aXZpdMOgIGxpYmVyby1wcm9mZXNzaW9uYWxlHwMFHy9TSVNXZWJPbkxpbmUvQXR0aXZpdGFBbHBpLmFzcHgfBAUeQXR0aXZpdMOgIGxpYmVyby1wcm9mZXNzaW9uYWxlHwVnHwZnHwcFHy9zaXN3ZWJvbmxpbmUvYXR0aXZpdGFhbHBpLmFzcHgfCGdkZAIJDw8WAh8BBQ9Qcm9udG8gU29jY29yc29kZAILD2QWAgIBD2QWAmYPZBYGAgEPFgIfBWdkAgsPPCsADQBkAg0PFgIfBWdkGAMFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYBBSBjdGwwMCRNYWluQ29udGVudCRJbWdCdG5BZ2dpb3JuYQUVY3RsMDAkTWFpbkNvbnRlbnQkd3d3D2dkBRBjdGwwMCRuYXZpZ2F0aW9uDw9kBQ9Qcm9udG8gU29jY29yc29kTUucCs6%2BZyLbulTAFPNo569%2B%2BDE%3D&__VIEWSTATEGENERATOR=1A2B14D6&__EVENTVALIDATION=%2FwEWAgK27duvDwKDm%2B%2FCCycw%2FWHLOR5AmzLF035J86RYL0wa&__ASYNCPOST=true",
  CURLOPT_HTTPHEADER => array(
    "cache-control: no-cache",
    "content-type: application/x-www-form-urlencoded"
  ),
));

$response = curl_exec($curl);

然后你的XPATH:

$dom = new DOMDocument();
@$dom->loadHTML($data);

$xpath = new DOMXPath($dom);

希望有所帮助。