我有以下代码,在Eclipse(java 7)中的本地Win 10机器上运行,然后作为在Tomcat 7上运行的servlet部署到Red Hat服务器:
Document doc = Jsoup.connect("https://www.google.com/search?q=best+hotel+chicago&start=1" + i).ignoreHttpErrors(true).referrer("https://www.google.com").userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36").followRedirects(true).timeout(15000).get();
Elements eSet = doc.select(".ads-ad a:not([style*=\"display:none\"])");
在我的本地Win 10机器上运行此操作我在页面上获得了付费Google广告的结果集,但是当我在servlet中部署完全相同的代码并在Red Hat机器上运行它时,我得到的结果没有使用相同的输入页面。我意识到谷歌正在使用jscript动态生成这个页面,但这并没有解释它的区别,因为它可以在我的Windows机器上运行。
为什么会这样?什么背景可以解释这些不同的结果?
以下是Win 10的结果:
<a class="_Jwu r-ixVPx5JNUXrk" href="http://www.marriott.com/default.mi" id="vn1s0p1c0" onmousedown="return google.arwt(this)" ontouchstart="return google.arwt(this)" data-preconnect-urls="https://www.marriott.com/,http://clickserve.dartsearch.net/" jsl="$t t-zxXzjt1d4B0;$x 0;">Hotels In Chicago - Marriott's Best Rate Guarantee - marriott.com</a>
<a class="_Jwu r-iRYug3IuGb_A" href="https://www.hotels.com/de1497539-qu4/luxury-hotels-chicago-illinois/" id="vn1s0p2c0" onmousedown="return google.arwt(this)" ontouchstart="return google.arwt(this)" data-preconnect-urls="http://clickserve.dartsearch.net/" jsl="$t t-zxXzjt1d4B0;$x 0;">Top 10 Luxury Hotels in Chicago, Illinois - Hotels.com</a>
<a class="_Qwu" href="/shopping/seller?q=hotels.com&hl=en&sa=X&ved=0ahUKEwjtwO6RlMHYAhXF6yYKHShDBqAQwQYILA">Calificación</a>
<a class="_Bu" href="http://www.harrisinteractive.com/Insights/EquiTrendRankings/2015EquiTrendRankings.aspx?col104=open#CollapsiblePanel104" onmousedown="return rwt(this,'','','','','AOvVaw1aVhv8iFKaZBaie0U2sGTC','','0ahUKEwjtwO6RlMHYAhXF6yYKHShDBqAQ9xsILg','','',event)">2015 Harris Poll EquiTrend</a>
<a class="_Jwu" href="http://www.hotels.com/?pos=HCOM_LATAM&locale=en_MX" id="vads-0-0-2-1-0" onmousedown="return google.arwt(this)">Hotels.com Rewards</a>
<a class="_Jwu" href="http://www.hotels.com/deals/?pos=HCOM_LATAM&locale=en_MX&PSRC=AFF05" id="vads-0-0-2-1-1" onmousedown="return google.arwt(this)">Deal Finder</a>
<a class="_Jwu r-iZU8HqezARjc" href="https://www.jdvhotels.com/hotels/illinois/chicago/talbott-hotel" id="vn1s0p3c0" onmousedown="return google.arwt(this)" ontouchstart="return google.arwt(this)" data-preconnect-urls="http://www.jdvhotels.com/" jsl="$t t-zxXzjt1d4B0;$x 0;">Best Hotel In Downtown Chicago - Stay At The Luxurius Talbott</a>
<a class="_Bu" href="https://travel.usnews.com/Hotels/review-The_Talbott_Hotel-Chicago-Illinois-10014/" onmousedown="return rwt(this,'','','','','AOvVaw3FnKilLnLJYICVDTz1jgVz','','0ahUKEwjtwO6RlMHYAhXF6yYKHShDBqAQ9xsINQ','','',event)">US News</a>
<a class="_Jwu" href="https://www.jdvhotels.com/hotels/illinois/chicago/talbott-hotel/20-east" id="vads-0-0-3-3-0" onmousedown="return google.arwt(this)">Dining Options</a>
<a class="_Jwu" href="https://www.jdvhotels.com/hotels/illinois/chicago/talbott-hotel/meetings" id="vads-0-0-3-3-1" onmousedown="return google.arwt(this)">Meetings And Events</a>
<a class="_Jwu" href="https://www.jdvhotels.com/hotels/illinois/chicago/talbott-hotel/rooms" id="vads-0-0-3-3-2" onmousedown="return google.arwt(this)">Book A Room</a>
<a class="_Jwu" href="https://www.jdvhotels.com/about/careers" id="vads-0-0-3-3-3" onmousedown="return google.arwt(this)">Career Opportunities</a>
<a class="_Jwu" href="https://www.jdvhotels.com/hotels/illinois/chicago/talbott-hotel/amenities" id="vads-0-0-3-3-4" onmousedown="return google.arwt(this)">Amenities Offered</a>
<a href="/aclk?sa=l&ai=DChcSEwjw2PaRlMHYAhUeucAKHaz8ChEYABAGGgJpbQ&sig=AOD64_3mOywZoffWgbDbEUhIf8i_92e1tw&q=&ctype=107&ved=0ahUKEwjtwO6RlMHYAhXF6yYKHShDBqAQmxAIPQ&adurl=" aria-hidden="true"><span class="_J2b"></span><span class="_vnd">20 E Delaware Pl, Chicago, IL</span></a>
<a href="javascript:void(0)" data-theme="0" data-width="-2" class="g-bbll ioz7YayzbDxY--XRAQhnJXLU" aria-haspopup="true" role="button" jsaction="r.saTe4DDW138" data-rtid="ioz7YayzbDxY" jsl="$x 1;" data-ved="0ahUKEwjtwO6RlMHYAhXF6yYKHShDBqAQ_kAIPg"><span class="_G2b"><span>Hoy abierto · Abierto las 24 horas</span><span class="mn-dwn-arw"></span></span></a>
<a class="_Jwu r-iocqcJJM2qAc" href="https://www.booking.com/city/us/chicago.html" id="vn1s0p4c0" onmousedown="return google.arwt(this)" ontouchstart="return google.arwt(this)" data-preconnect-urls="http://www.booking.com/" jsl="$t t-zxXzjt1d4B0;$x 0;">Best Hotels in Chicago, IL - Lowest price guarantee - booking.com</a>
<a class="_Jwu r-iRsLwQlU8vHQ" href="http://www.luxuryhotelsguides.com/?ufi=20033173" id="vn1s3p1c0" onmousedown="return google.arwt(this)" ontouchstart="return google.arwt(this)" data-preconnect-urls="http://www.luxuryhotelsguides.com/" jsl="$t t-zxXzjt1d4B0;$x 0;">Top 10 Luxurious Hotels Chicago - Best 5 Star Luxurious Hotels Chicago</a>
<a class="_Jwu" href="http://luxuryhotelsguides.com/?id=top10best" id="vads-3-0-1-1-0" onmousedown="return google.arwt(this)">Top 10 Best Luxury Hotels</a>
<a class="_Jwu" href="http://luxuryhotelsguides.com/?id=coolhotels" id="vads-3-0-1-1-1" onmousedown="return google.arwt(this)">Cool Luxury Hotels</a>
<a class="_Jwu" href="http://luxuryhotelsguides.com?id=luxurysl" id="vads-3-0-1-1-2" onmousedown="return google.arwt(this)">5 Star Luxury Hotels</a>
<a class="_Jwu" href="http://luxuryhotelsguides.com/?id=fast" id="vads-3-0-1-1-3" onmousedown="return google.arwt(this)">Fast & Easy Hotel Booking</a>
在Red Hat上,此列表为空 - 不会产生错误。我确实理解这违反了Google的服务条款,但这不是问题所在。