我有一个脚本,用于从Aliexpress中提取产品并分析它的信息,但几周前就停止了工作。
我注意到我的cUrl GET请求不像以前那样工作,现在它被重定向到登录页面。 Chrome可以毫无问题地访问该网址,我也尝试了在线卷曲,它也正常运行(http://onlinecurl.com/)。
这是网址:es.aliexpress.com/category/204004798/jackets-coats.html
和我正在使用的脚本: https://gist.github.com/xAlstrat/29a4b01cd5b2c153f9a6
我也尝试使用开发人员工具中的chrome请求标头(我的脚本中的$ header数组),但结果是相同的,它被重定向。
我的请求可以通过aliexpress进行过滤吗?那么,为什么onlinecurl正常工作?
提前致谢。
答案 0 :(得分:1)
这是我的解决方案。
<?php
require "simplehtmldom/simple_html_dom.php";
$url = 'https://www.aliexpress.com/item/2017-New-Summer-Fashion-Mens-T-Shirts-Slim-Fit-Short-Sleeve-T-Shirt-032/32798976881.html';
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_VERBOSE, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL, $url );
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 120);
curl_setopt($ch, CURLOPT_TIMEOUT, 120);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
':authority:www.aliexpress.com',
':method:GET',
// ':path:/item/2017-New-Summer-Fashion-Mens-T-Shirts-Slim-Fit-Short-Sleeve-T-Shirt-032/32798976881.html',
':scheme:https',
'accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
// 'accept-encoding:gzip, deflate, sdch, br',
'accept-language:en-US,en;q=0.8',
'cache-control:no-cache',
'cookie:ali_apache_id=10.181.15.98.1496226301876.152542.8; ali_beacon_id=10.181.15.98.1496226301876.152542.8; aep_common_f=aEhfbgrZ7e4e9UmjuOWm1CANyaLLaqTDc80Oan6/v5duIrtz3XfvQQ==; cna=BIi1EWdCs1QCAS43pj2EYRUu; _uab_collina=149806153775875750712574; _mle_tmp0=eNrz4A12DQ729PeL9%2FV3cfUx8KvOTLFScnF1dDR1dDV2dLU0sDSxNLEwsTR0cjN2dHI0MTM0NzNW0kkusTI0sbQwMTczMjExtDDSSUxGE8itsDKojQIApLsX8Q%3D%3D; aep_history=keywords%5E%0Akeywords%09%0A%0Aproduct_selloffer%5E%0Aproduct_selloffer%0932788134848%0932798976881; xman_us_t=x_lid=bg1134819498lzch&sign=y&x_user=bUBd5Jk0irdh4KaPNADUTaZgZPbhimHjaVryhYI9YCI=&ctoken=18y_131g18r61&need_popup=y&l_source=aliexpress; xman_us_f=zero_order=n&x_locale=en_US&x_l=0&last_popup_time=1496226312427&x_user=BG|Nikola|Mihaylov|ifm|754321409&no_popup_today=n; intl_locale=en_US; aep_usuc_f=isfm=y&site=glo&c_tp=USD&x_alimid=754321409&isb=y®ion=BG&b_locale=en_US; intl_common_forever=iP9SzYB6uR11n9PDuziWZV+m6wS5lewBhe0mchSYZzPnAbhr0LZabw==; _ga=GA1.2.1468541176.1496226329; _gid=GA1.2.15678206.1498472399; JSESSIONID=C8CB956B69CC08A9B351C25C869BBF29; _umdata=70724C19E34629158F306CE35FF4594FD67FF1CAB2345D48D8DBB0AA935079C88BF9397A2C0FD39DCD43AD3E795C914C85C011A26645999BFC392F47E8EAF27C; ali_apache_track=mt=1|ms=|mid=bg1134819498lzch; ali_apache_tracktmp=W_signed=Y; acs_usuc_t=acs_rt=4426ebd3d8e64e528a12a78ded645f48&x_csrf=o8z03s_owisa; xman_f=A2GX/Y1rs0cttixwSxevdzKgU2yRTOLeKBX7fdKnMDdXVLpqHphq8YkU+FAR6NBhi7hO5A7zc3/N6J/t/H9GWyHI+NwKJhVmLd4RElBNdA8pz3lnGKz0ftxea14UUDnFXkIy11hLAIx4XZqeotvVY5xqQ2mHDWxJdTYyFGUJp2NUu5sA+A+/LFaZo/EhWIfeloGoaQnpsd2ps2jdCJ6JyDiKEeiASt17bEr5yszxaPRfbd7hfZUA4PIxgI7v5xSjj851b0fxepl6UbWi17mb+nk2O1VDuqt3mITEb3d0ZfzzZpWy9G6CuVbj/20I8NLAlFY52SkZtltbyMAImZfaIaEFhz350zco; xman_t=jf4jAohRGS5aZgdMN923jjta0ojsSeWEwqlEWooob6QtKTtCgvI1+gCB8pCzof8e0Fa+5dsa4Qdi4uZStlIJ5bhyvUdv6xIqA03ua02HlZ8MwuJUAcQ1PvyeTvJiNRtUzKZ0KN4FeDGdPD02q9xuly6mAD7dPmkJ2xZpswuxBMKMJ7Cu0OGB//ooxRvcjMzbvPxf9q/Q/+uySj2V3tLsQ3uAMOETjIklOh/aoXkZNWDy9WX2fRsAanRANkHxb1MoL3hntz7XDtYCx1MZis9/cAsg76m72h+lz+6pSbIZeJlH6kt9G3JkNafhR9EOgaxiGS/dr3S1TVPKJ8Q58yZMKhjrNQ2yCNzpq1SkWZs7LMNIcGVvDq9kxhYryBgx9ADN4ezb4uT7kYAZaz09myq8tIdlP6P+QHHkADgkeSnsLmodjEednJop66zIkTrowbYq8s/5ggHXHqnfyvDmi/O/00thL/lNG05+KHaOU1VV1y7VXtxxDlWADzQwPk9/AnMWK3VV80qXOsSGXpycloxhz7agZlsrCRV6u+KhsaUqMd/yszE09+GNVwdN7/UeGAklUvUHUAIykqRkpdOp1z0Vi87QWFW5TOn850K36OgHVxdiYxldDmEewuXBMmrreBWz; __utmt=1; __utma=3375712.1468541176.1496226329.1498476312.1498478200.3; __utmb=3375712.5.10.1498478200; __utmc=3375712; __utmz=3375712.1498476312.2.2.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); isg=AgcHaslcEs7cDpY8UNqaqtarlrsRpIzPoZDqYtndZhbiSDMK4NzOPvZIzj7t',
'pragma:no-cache',
'upgrade-insecure-requests:1',
'user-agent:Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
));
$response = curl_exec($ch);
if($response == false){
echo "Curl error: " . curl_error($ch) . "(" . curl_errno($ch) . ")";
exit;
}
curl_close($ch);
$dom = str_get_html($response);
echo "<pre>";
var_dump($dom);
echo "</pre>";
答案 1 :(得分:0)
我运行脚本,它运行得很好。 CURL也可以在命令行中完美运行。由于您已发送到站点/特定URL的大量请求,问题很可能是他们已阻止您的IP /将其重定向到登录页面。
我建议尝试通过卷曲使用代理服务器并查看它是否适用于您,例如curl_setopt($curl, CURLOPT_PROXY, 'PROXY_IP_HERE');
此外,Aliexpress的API位于here,您应该可以使用。