Question

我试图在Python 2.7中创建一个抓取脚本。

请求没问题，但我很难用美味的汤来解析这张桌子。我已经尝试了很多，并在论坛上搜索了很多，但是我第一次这样做没有任何作用。

以下是代码：

 import requests, os 
 from bs4 import BeautifulSoup  

 url='http://fse.vdkruijssen.eu/ferrylist.php' params={'selectplane':'Cessna 208 Caravan','submit':''}
 response=requests.post(url, data=params) 

 soup = BeautifulSoup(response.text, "html5lib")
 table=soup.find('table')
 print table

但这不会返回任何表格。我试图至少检索第一列和最后一列。

Answer 1

soup = BeautifulSoup(response.text, "lxml")

将解析器更改为lxml

Beautiful Soup支持Python标准库中包含的HTML解析器，但它也支持许多第三方Python解析器。一个是lxml解析器。根据您的设置，您可以使用以下命令之一安装lxml：

$ apt-get install python-lxml

$ easy_install lxml

$ pip install lxml

默认情况下，BS4使用lxml解析器。

你会如何使用Python解析这个HTML表？

1 个答案: