I am using mechanize.Browser().Open("http://urltoscrape.com")
But the weird thing is that if I see the DOM from mechanize.Browser().response().read()
, it doesn't contain <a>
link element. However if I browse the website using firefox I can see the <a>
link element in source.
I get this from mechanize.Browser().response().read()
:
<script type="text/javascript" language="javascript">
dodo4("PGEgaHJlZj0iZ3JhcGhpcy1nYWxzLV8xNTAtLS1lbGVtZW50LWNyeXN0YWwtMTYtMi04Lmh0bWwiIHRpdGxlPSJZdW1hIEFzYW1pIC8gZ3JhcGhpcyBnYWxzICMxNTAgLSBlbGVtZW50IGNyeXN0YWwgbmV4dCAxNiBwaWN0dXJlcyIgIG9uTW91c2VPdmVyPSJzd2FwKCduZXh0JywxKSIgb25Nb3VzZU91dD0ic3dhcCgnbmV4dCcsMCkiIG9uQ2xpY2s9InNob3dpdD1mYWxzZSI+PGltZyBuYW1lPSJuZXh0IiBzcmM9Imh0dHA6Ly9pbWcuYm9ieC5jb20vaW1hZ2VzL25leHQwLmdpZiIgYm9yZGVyPSIwIiBBTFQ9Im5leHQiIFdJRFRIPSIzMiIgSEVJR0hUPSIyNCIgQUxJR049IlJJR0hUIj48L0E+");
</script>
but in firefox I see the link too below this javascript code:
<script language="javascript" type="text/javascript">
dodo4("PGEgaHJlZj0iZ3JhcGhpcy1nYWxzLV8xNTAtLS1lbGVtZW50LWNyeXN0YWwtMTYtMi04Lmh0bWwiIHRpdGxlPSJZdW1hIEFzYW1pIC8gZ3JhcGhpcyBnYWxzICMxNTAgLSBlbGVtZW50IGNyeXN0YWwgbmV4dCAxNiBwaWN0dXJlcyIgIG9uTW91c2VPdmVyPSJzd2FwKCduZXh0JywxKSIgb25Nb3VzZU91dD0ic3dhcCgnbmV4dCcsMCkiIG9uQ2xpY2s9InNob3dpdD1mYWxzZSI+PGltZyBuYW1lPSJuZXh0IiBzcmM9Imh0dHA6Ly9pbWcuYm9ieC5jb20vaW1hZ2VzL25leHQwLmdpZiIgYm9yZGVyPSIwIiBBTFQ9Im5leHQiIFdJRFRIPSIzMiIgSEVJR0hUPSIyNCIgQUxJR049IlJJR0hUIj48L0E+");
</script>
<a onclick="showit=false" onmouseout="swap('next',0)" onmouseover="swap('next',1)" title="hello world" href="next-page.html">
</a>
答案 0 :(得分:1)
尝试在执行任何请求之前设置用户代理
<link href='http://fonts.googleapis.com/css?family=Oswald' rel='stylesheet' type='text/css'>
<link href='http://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>