Question

我正在尝试使用beautifulsoup来解析下面网址中表格中的数据，

http://hk.warrants.com/home/en/sgdata/list_e.cgi#topsearch

由于表没有类属性或id，我不能使用soup.find（“table”，{“title”：“TheTitle”}）的常规方法来定位表。相反，我试过了，

warrantUrl = 'http://hk.warrants.com/home/en/sgdata/list_e.cgi#topsearch'
warrantPage = urlopen(warrantUrl)
soup = BeautifulSoup(warrantPage, 'html.parser')
table = soup.find_all("tr")
paragraphs = []
for x in table:
    paragraphs.append(str(x))

另外，我尝试了这篇文章中提到的方法， Parse table with BeautifulSoup Python。但没有成功......

Answer 1

该网站使用javascript，BeautifulSoup无法将其转换为html，它无法处理它，也不能urllib，你需要查看Ghost for python。

http://jeanphix.me/Ghost.py/

阅读文档，它非常简单，功能强大，与请求类似。它有能力评估＆＃34; javascripts并以各种方式返回他们的价值观。

在Python中没有id或class属性的beautifulsoup解析表

1 个答案: