我目前正在考虑一些自动化来阅读网页数据。因此,可以阅读以下类型的表格以从网页读取excel:excel的值应为name of condion,Operator and Expressions
。
修改
>>> from urllib import urlopen
>>> from bs4 import BeautifulSoup
>>> source = BeautifulSoup(urlopen(url))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'url' is not defined
>>> source = BeautifulSoup(urlopen(https://demo.aravo.com))
File "<stdin>", line 1
source = BeautifulSoup(urlopen(https://demo.aravo.com))
^
SyntaxError: invalid syntax
>>> from urllib import urlopen
>>> from bs4 import BeautifulSoup
>>> source = BeautifulSoup(urlopen(https://demo.aravo.com/))
File "<stdin>", line 1
source = BeautifulSoup(urlopen(https://demo.aravo.com/))
^
SyntaxError: invalid syntax
>>> source = BeautifulSoup(urlopen(demo.aravo.com/))
File "<stdin>", line 1
source = BeautifulSoup(urlopen(demo.aravo.com/))
^
SyntaxError: invalid syntax
>>> source = BeautifulSoup(urlopen(demo.aravo.com))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'demo' is not defined
>>>
EDIT2
C:\Users>cd..
C:\>cd cd C:\Python27\selenv\Scripts
The filename, directory name, or volume label syntax is incorrect.
C:\>cd C:\Python27\selenv\Scripts
C:\Python27\selenv\Scripts>python
Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> from urllib import urlopen
>>> from bs4 import BeautifulSoup
>>> source = BeautifulSoup(urlopen("https://demo.aravo.com/"))
>>> tables = source.findAll('td')
>>> import csv
>>> writer = csv.writer(open('filename.csv','w'))
>>> writer.writerow(rows)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'rows' is not defined
>>>
由于
答案 0 :(得分:1)
有可能,查看名为Beautiful Soup的图书馆,它将简化您在报废页面后提供正确信息的过程
#!/usr/bin/env python
from selenium import webdriver
browser = webdriver.Firefox()
url = 'http://python.org'
browser.get(url)
page_source = browser.page_source
print page_source
答案 1 :(得分:1)
您还可以使用urllib库中的urlopen来获取页面源,然后使用BeautifulSoup来解析html
from urllib import urlopen
from beautifulSoup import BeautifulSoup
#get BeautifulSoup object
source = BeautifulSoup(urlopen(url))
#get list of table elements from source
tables = source.findAll('td')
保存信息以便在exel中使用它的最简单方法可能是将其保存为.csv文件
您可以使用csv模块
执行此操作import csv
writer = csv.writer(open('filename.csv','w'))
writer.writerow(rows)
所有这些模块都记录得很清楚,你应该能够填补空白。
要确保安装了这些库,请确保您拥有easy_install,可以通过setuptools下载。运行easy_install后,将其输入shell:
easy_install csv
easy_install BeautifulSoup
easy_install urllib
easy_install ipython
然后运行ipython进入实时python环境
ipython
这将打开一个python shell,可以测试以前的代码。我希望这有帮助。如果您需要更多基础知识帮助,请在网上搜索python教程。 [scraperwiki][3]
在python中有一些很好的Web解析示例。