我想在网站上解析Afk。,Aantal和Zetels列:http://www.nlverkiezingen.com/TK2012.html我最终可以保存为JSON文件。
在将其保存为json文件之前,我需要解析这些元素。
我有
from bs4 import BeautifulSoup
import urllib
jaren = [str("2010"), str("2012")]
for Jaargetal in jaren:
r = urllib.urlopen("http://www.nlverkiezingen.com/TK" + Jaargetal +".html").read()
soup = BeautifulSoup(r, "html.parser")
tables = soup.find_all("table")
for table in tables:
header = soup.find_all("h1")[0].getText()
print header
trs = table.find_all("tr")[0].getText()
print '\n'
for tr in table.find_all("tr"):
print "|".join([x.get_text().replace('\n','') for x in tr.find_all('td')])
我试过
from bs4 import BeautifulSoup
import urllib
jaren = [str("2010"), str("2012")]
for Jaargetal in jaren:
r = urllib.urlopen("http://www.nlverkiezingen.com/TK" + Jaargetal +".html").read()
soup = BeautifulSoup(r, "html.parser")
tables = soup.find_all("table")
for table in tables:
header = soup.find_all("h1")[0].getText()
print header
for tr in table.find_all("tr"):
firstTd = tr.find("td")
if firstTd and firstTd.has_attr("class") and "l" in firstTd['class']:
tds = tr.find_all("td")
for tr in table.find_all("tr"):
print "|".join([x.get_text().replace('\n','') for x in tr.find_all('td')])
break
我做错了什么或我该做什么,我是否走在正确的轨道上?
答案 0 :(得分:0)
仅提取所需列的一个选项是检查列的索引。定义您感兴趣的列索引:
DESIRED_COLUMNS = {1, 2, 5} # it is a set
然后将enumerate()
与find_all()
:
"|".join([x.get_text().replace('\n', '')
for index, x in enumerate(tr.find_all('td'))
if index in DESIRED_COLUMNS])