我有一个html文件,里面有一个表。这个表有30列,但我只需要读几个。
到目前为止代码:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("myfile.htm"))
table = soup.find("table", attrs={"class":"myTable"})
# The first tr contains the field names.
headings = [th.get_text() for th in table.find("tr").find_all("th")]
datasets = []
for row in table.find_all("tr")[1:]:
dataset = zip(headings, (td.get_text() for td in row.find_all("td")))
datasets.append(dataset)
for dataset in datasets:
for field in dataset:
print "{0:<16}: {1}".format(field[0], field[1])
如何指定我想要阅读的列?
答案 0 :(得分:1)
选项1.使用table.find("tr").findNext()
https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.html#find-all-next-and-find-next