Question

我有一个html文件，里面有一个表。这个表有30列，但我只需要读几个。

到目前为止

代码：

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("myfile.htm"))
table = soup.find("table", attrs={"class":"myTable"})

# The first tr contains the field names.
headings = [th.get_text() for th in table.find("tr").find_all("th")]

datasets = []
for row in table.find_all("tr")[1:]:
    dataset = zip(headings, (td.get_text() for td in row.find_all("td")))
    datasets.append(dataset)

for dataset in datasets:
    for field in dataset:
        print "{0:<16}: {1}".format(field[0], field[1])

如何指定我想要阅读的列？

Answer 1

选项1.使用table.find("tr").findNext()

https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.html#find-all-next-and-find-next

选项2.在BeautifulSoup中使用lxml并提供xpath。

如何使用BeautifulSoup指定要读取的列

1 个答案: