
时间:2013-11-28 19:27:25

标签: python beautifulsoup redhat

Beautiful Soup在本地计算机上正常运行,但在其他服务器上运行不正常。

import urllib2
import bs4

url = urllib2.urlopen("http://www.google.com")
html = url.read()
soup = bs4.BeautifulSoup(html)

print soup




此解决方案Beautiful Soup returning nothing不适用于我的问题

1 个答案:

答案 0 :(得分:0)


我从AWS中推出了一个微型Redhat实例,这是从SSH到全新的redhat机器的完整过程。 enter image description here


$ ssh -i key.pem ec2-user@awsip
The authenticity of host 'awsip' cant be established.
RSA key fingerprint is ....
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'awsip' (RSA) to the list of known hosts.
[ec2-user@awsip ~]$ sudo easy_install beautifulsoup4
Searching for beautifulsoup4
Reading http://pypi.python.org/simple/beautifulsoup4/
Installed /usr/lib/python2.6/site-packages/beautifulsoup4-4.3.2-py2.6.egg
Processing dependencies for beautifulsoup4
Finished processing dependencies for beautifulsoup4


[ec2-user@awsip ~]$ python
Python 2.6.6 (r266:84292, May 27 2013, 05:35:12)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> from bs4 import BeautifulSoup
>>> html = urllib2.urlopen("http://www.google.com").read()
>>> soup = BeautifulSoup(html)
>>> print html[:100]
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage"><head><meta content="Search t
>>> print soup.prettify()[:100]
<!DOCTYPE html>
<html itemscope="" itemtype="http://schema.org/WebPage">
  <meta content="Se

要调试它是urllib2或bs4的错误: 尝试运行此代码:

from bs4 import BeautifulSoup

html = """
<div id="1">numberone</div>
<div id="2">numbertwo</div>

print BeautifulSoup(html).find('div', {"id":"1"})


<div id="1">numberone</div>