Question

我是一名新闻专业的学生，对Python领域是一个全新的领域。目前，我正在尝试转换此site上的表到csv中，以便将其添加到数据库中。通过大量的故障排除和一些YouTube教程，我得出了以下结论：

import csv
import urllib.request
from bs4 import BeautifulSoup
f = open('dataoutput.csv', 'w', newline = '')
writer = csv.writer(f)
soup = BeautifulSoup(urllib.request.urlopen("https://www.townofchapelhill.org/town-hall/departments-services/planning-and-sustainability/gis-analytics/development-activity-report").read(), 'lxml')
tbody = soup('table', {"class":"tableData tablesorter tablesorter-blue hasFilters hasStickyHeaders"}) [0].find_all('tr')
for row in tbody: 
    cols = row.findChildren(recursive=False)
    cols = [ele.text.strip() for ele in cols]
    writer.writerow(cols)
    print(cols)
f.close()

现在，代码返回一个csv，但它为空。在Mac OSX终端中，出现以下错误：

as9934-pc:pythonstuff as9934$ python3 ./make.py
Traceback (most recent call last):
  File "./make.py", line 8, in <module>
    tbody = soup('table', {"class":"tableData tablesorter tablesorter-blue hasFilters hasStickyHeaders"}) [0].find_all('tr')
IndexError: list index out of range

我指定的唯一数字是[0]，所以我很困惑。

有什么想法吗？

Answer 1

如果列表的零位索引不存在，则该列表必须没有元素（即，它是一个空列表）。因此，soup('table', {"class":"tableData tablesorter tablesorter-blue hasFilters hasStickyHeaders"})返回一个空列表。您可以通过查看len(soup('table', {"class":"tableData tablesorter tablesorter-blue hasFilters hasStickyHeaders"}))返回什么来确认这一点。

Answer 2

网站正在使用您必须从中获取iFrame以便访问表格的信息。将此链接（位于<iframe src ...>中）用作链接：

https://gis.townofchapelhill.org/developments/report/report.aspx

伴随：

tbody = soup.findAll('table')

然后您将获得行。

Answer 3

页面中包含Java脚本。因此，完整的表数据在脚本中。尝试将其添加到您的代码中。

with open("test.html", "w") as file:
   file.write(str(soup))

在浏览器中打开test.html。
在文本编辑器中打开相同的文件。差异将可见。表内容在文本编辑器中不可见，但是您可以在浏览器中看到该表。

有多种解决方案。 Check this link for simple solutions

IndexError：尝试在python中抓表时，列表超出[0]的范围

3 个答案: