如何使用BeautifulSoup指定要读取的列

时间:2017-03-09 14:32:05

标签: python beautifulsoup

我有一个html文件,里面有一个表。这个表有30列,但我只需要读几个。

到目前为止

代码:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("myfile.htm"))
table = soup.find("table", attrs={"class":"myTable"})

# The first tr contains the field names.
headings = [th.get_text() for th in table.find("tr").find_all("th")]

datasets = []
for row in table.find_all("tr")[1:]:
    dataset = zip(headings, (td.get_text() for td in row.find_all("td")))
    datasets.append(dataset)

for dataset in datasets:
    for field in dataset:
        print "{0:<16}: {1}".format(field[0], field[1])  

如何指定我想要阅读的列?

1 个答案:

答案 0 :(得分:1)

选项1.使用table.find("tr").findNext()

https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.html#find-all-next-and-find-next

选项2.在BeautifulSoup中使用lxml并提供xpath。 in Chrome