从网站收到的DOM没有<a> link

时间:2015-05-23 13:38:14

标签: python xpath mechanize

I am using mechanize.Browser().Open("http://urltoscrape.com")

But the weird thing is that if I see the DOM from mechanize.Browser().response().read(), it doesn't contain <a> link element. However if I browse the website using firefox I can see the <a> link element in source.

I get this from mechanize.Browser().response().read():

<script type="text/javascript" language="javascript">
dodo4("PGEgaHJlZj0iZ3JhcGhpcy1nYWxzLV8xNTAtLS1lbGVtZW50LWNyeXN0YWwtMTYtMi04Lmh0bWwiIHRpdGxlPSJZdW1hIEFzYW1pIC8gZ3JhcGhpcyBnYWxzICMxNTAgLSBlbGVtZW50IGNyeXN0YWwgbmV4dCAxNiBwaWN0dXJlcyIgIG9uTW91c2VPdmVyPSJzd2FwKCduZXh0JywxKSIgb25Nb3VzZU91dD0ic3dhcCgnbmV4dCcsMCkiIG9uQ2xpY2s9InNob3dpdD1mYWxzZSI+PGltZyBuYW1lPSJuZXh0IiBzcmM9Imh0dHA6Ly9pbWcuYm9ieC5jb20vaW1hZ2VzL25leHQwLmdpZiIgYm9yZGVyPSIwIiBBTFQ9Im5leHQiIFdJRFRIPSIzMiIgSEVJR0hUPSIyNCIgQUxJR049IlJJR0hUIj48L0E+");
</script>

but in firefox I see the link too below this javascript code:

<script language="javascript" type="text/javascript">
dodo4("PGEgaHJlZj0iZ3JhcGhpcy1nYWxzLV8xNTAtLS1lbGVtZW50LWNyeXN0YWwtMTYtMi04Lmh0bWwiIHRpdGxlPSJZdW1hIEFzYW1pIC8gZ3JhcGhpcyBnYWxzICMxNTAgLSBlbGVtZW50IGNyeXN0YWwgbmV4dCAxNiBwaWN0dXJlcyIgIG9uTW91c2VPdmVyPSJzd2FwKCduZXh0JywxKSIgb25Nb3VzZU91dD0ic3dhcCgnbmV4dCcsMCkiIG9uQ2xpY2s9InNob3dpdD1mYWxzZSI+PGltZyBuYW1lPSJuZXh0IiBzcmM9Imh0dHA6Ly9pbWcuYm9ieC5jb20vaW1hZ2VzL25leHQwLmdpZiIgYm9yZGVyPSIwIiBBTFQ9Im5leHQiIFdJRFRIPSIzMiIgSEVJR0hUPSIyNCIgQUxJR049IlJJR0hUIj48L0E+");
</script>
<a onclick="showit=false" onmouseout="swap('next',0)" onmouseover="swap('next',1)" title="hello world" href="next-page.html">
</a>

1 个答案:

答案 0 :(得分:1)

尝试在执行任何请求之前设置用户代理

<link href='http://fonts.googleapis.com/css?family=Oswald' rel='stylesheet' type='text/css'>
<link href='http://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>