我有一个以前可以使用的脚本,但是显然,它从中下载文件的网站已以某种方式更改了格式。我已经将POST请求的内容和标头更改为我认为应该的样子,但是它并没有像我期望的那样拉出文件。这是我现在针对该功能的脚本片段:
$url='http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx?';
$header= "Host: mansfield.tea.state.tx.us\r\n";
$header.= "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0\r\n";
$header.= "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n";
$header.= "Accept-Language: en-GB,en;q=0.5\r\n";
$header.= "Accept-Encoding: gzip, deflate\r\n";
$header.= "Referer: http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx\r\n";
$header.= "Content-Type: application/x-www-form-urlencoded\r\n";
$header.= "Content-Length: 14067\r\n";
$header.= "Cookie: ga=GA1.3.400055834.1504257175; ASP.NET_SessionId=cj020m45uwj5hpmwaclhvmuk\r\n";
$header.= "Connection: keep-alive\r\n";
$header.= "Upgrade-Insecure-Requests: 1\r\n";
$postdata= array(
"__VIEWSTATE" => "/wEPDwULLTE3NDczMDI1MTIPZBYCAgEPZBYEAgMPFCsABWRkZBQrAAcQFg4eBkl0ZW1JRAURX2N0bDAtbWVudUl0ZW0wMDAeCEl0ZW1UZXh0BVI8YSBpZD0iaHlwZXJsaW5rMSIgaHJlZj0iL1RFQS5Bc2tURUQuV2ViL0Zvcm1zL0hvbWUuYXNweCIgY2xhc3M9Im1lbnVOYXYiPkhvbWU8L2E+HgdJdGVtVVJMBRF+L0Zvcm1zL0hvbWUuYXNweB4PTWVudUl0ZW1Ub29sVGlwBQRIb21lHhBNZW51SXRlbUNzc0NsYXNzBRJob3Jpem9udGFsTWVudUl0ZW0eFUl0ZW1Nb3VzZU92ZXJDc3NDbGFzcwUWaG9yaXpvbnRhbE1lbnVTZWxlY3RlZB4LSXRlbVNlY3VyZWRoZGQQFgwfAAURX2N0bDAtbWVudUl0ZW0wMDEfAQVNPGEgaHJlZj0iL1RFQS5Bc2tURUQuV2ViL0Zvcm1zL1NlYXJjaE1h…m9udGFsTWVudUl0ZW0fBQUWaG9yaXpvbnRhbE1lbnVTZWxlY3RlZB8GaGRkFCsAAQUHdGVhdGVtcGQCDQ8QZA8WCWYCAQICAgMCBAIFAgYCBwIIFgkQBQ1TY2hvb2wgTnVtYmVyBQ1TY2hvb2wgTnVtYmVyZxAFC1NjaG9vbCBOYW1lBQtTY2hvb2wgTmFtZWcQBQ1EaXN0cmljdCBOYW1lBQ1EaXN0cmljdCBOYW1lZxAFC0NvdW50eSBOYW1lBQtDb3VudHkgTmFtZWcQBQZSZWdpb24FBlJlZ2lvbmcQBQtTY2hvb2wgQ2l0eQULU2Nob29sIENpdHlnEAUPU2Nob29sIFppcCBDb2RlBQ9TY2hvb2wgWmlwIENvZGVnEAUNRGlzdHJpY3QgQ2l0eQUNRGlzdHJpY3QgQ2l0eWcQBRFEaXN0cmljdCBaaXAgQ29kZQURRGlzdHJpY3QgWmlwIENvZGVnZGRke3qSSaoJbwyFyN/A1p+yD+sPADY=",
"__VIEWSTATEGENERATOR" => "44F2C40C",
"btnDownloadFile" => "Download+File",
"ddlSortOrder" => "School+Number"
);
$opts = array(
'http' => array(
'method' => 'POST',
'content' => http_build_query($postdata),
'header' => $header
)
);
$context = stream_context_create($opts);
$file = file_get_contents($url, false, $context);
它应该返回一个包含德克萨斯州学校和学校数据列表的文件,但是没有。
我从Webdeveloper控制台获取了header($ header)部分和content($ postdata)部分的信息。提取数据的网站是http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx。
关于如何修复这些标头和内容以通过php中的CLI通过命令行下载文件的任何想法?
谢谢
答案 0 :(得分:1)
ViewState每次都会更改,因此请用simple_html_dom删除viewstate并将其传递
这是工作代码
<?php
include_once('simple_html_dom.php');
$url="http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx";
$html=file_get_html($url);
$viewstate = $html->find('input',0)->value;
$url='http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx?';
$header= "Host: mansfield.tea.state.tx.us\r\n";
$header.= "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0\r\n";
$header.= "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n";
$header.= "Accept-Language: en-GB,en;q=0.5\r\n";
$header.= "Accept-Encoding: gzip, deflate\r\n";
$header.= "Referer: http://mansfield.tea.state.tx.us/TEA.AskTED.Web/Forms/DownloadFile.aspx\r\n";
$header.= "Content-Type: application/x-www-form-urlencoded\r\n";
$header.= "Content-Length: 14067\r\n";
$header.= "Cookie: ga=GA1.3.400055834.1504257175; ASP.NET_SessionId=cj020m45uwj5hpmwaclhvmuk\r\n";
$header.= "Connection: keep-alive\r\n";
$header.= "Upgrade-Insecure-Requests: 1\r\n";
$postdata= array(
"__VIEWSTATE" => $viewstate,
"__VIEWSTATEGENERATOR" => "44F2C40C",
"btnDownloadFile" => "Download+File",
"ddlSortOrder" => "School+Number"
);
$opts = array(
'http' => array(
'method' => 'POST',
'content' => http_build_query($postdata),
'header' => $header
)
);
$context = stream_context_create($opts);
$file = file_get_contents($url, false, $context);
echo $file;
?>