下面的代码从以下页面中删除数据: “http://www.gbgb.org.uk/resultsMeeting.aspx?id=136005
它会擦除所有相关字段并将其打印到屏幕上。但是,我想尝试以表格形式将数据打印到csv文件中,以导出到电子表格或数据库中。
在网站源HTML中,曲目,日期,日期时间(比赛时间)等级,距离和奖品来自div类“resultsBlockheader”,并且在网页上形成了竞赛卡的顶部区域。
源HTML中的种族主体来自div类“resultsBlock”,这包括完成位置(Fin)Greyhound,Trap,SP,Time / Sec和Time distance。
最终看起来像这样
track,date,datetime,grade,distance,prize,fin,greyhound,trap,SP,timeSec,time distance
这是否可行,或者我必须以表格形式将其打印到屏幕上才能将其导出到csv。
from urllib import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.gbgb.org.uk/resultsMeeting.aspx?id=136005")
bsObj = BeautifulSoup(html, 'lxml')
nameList = bsObj. findAll("div", {"class": "track"})
for name in nameList:
List = bsObj. findAll("div", {"class": "distance"})
for name in nameList:
print(name. get_text())
nameList = bsObj. findAll("div", {"class": "prizes"})
for name in nameList:
print(name. get_text())
nameList = bsObj. findAll("li", {"class": "first essential fin"})
for name in nameList:
print(name. get_text())
nameList = bsObj. findAll("li", {"class": "essential greyhound"})
for name in nameList:
print(name. get_text())
nameList = bsObj. findAll("li", {"class": "trap"})
for name in nameList:
print(name. get_text())
nameList = bsObj. findAll("li", {"class": "sp"})
for name in nameList:
print(name. get_text())
nameList = bsObj. findAll("li", {"class": "timeSec"})
for name in nameList:
print(name. get_text())
nameList = bsObj. findAll("li", {"class": "timeDistance"})
for name in nameList:
print(name. get_text())
nameList = bsObj. findAll("li", {"class": "essential trainer"})
for name in nameList:
print(name. get_text())
nameList = bsObj. findAll("li", {"class": "first essential comment"})
for name in nameList:
print(name. get_text())
nameList = bsObj. findAll("div", {"class": "resultsBlockFooter"})
for name in nameList:
print(name. get_text())
nameList = bsObj. findAll("li", {"class": "first essential"})
for name in nameList:
print(name. get_text())
答案 0 :(得分:1)
不确定为什么你没有按照this answer中建议的代码来解决上一个问题 - 它实际上解决了分组字段问题。
以下是将track
,date
和greyhound
转储到csv的跟进代码:
import csv
from bs4 import BeautifulSoup
import requests
html = requests.get("http://www.gbgb.org.uk/resultsMeeting.aspx?id=135754").text
soup = BeautifulSoup(html, 'lxml')
rows = []
for header in soup.find_all("div", class_="resultsBlockHeader"):
track = header.find("div", class_="track").get_text(strip=True).encode('ascii', 'ignore').strip("|")
date = header.find("div", class_="date").get_text(strip=True).encode('ascii', 'ignore').strip("|")
results = header.find_next_sibling("div", class_="resultsBlock").find_all("ul", class_="line1")
for result in results:
greyhound = result.find("li", class_="greyhound").get_text(strip=True)
rows.append({
"track": track,
"date": date,
"greyhound": greyhound
})
with open("results.csv", "w") as f:
writer = csv.DictWriter(f, ["track", "date", "greyhound"])
for row in rows:
writer.writerow(row)
运行代码后results.csv
的内容:
Sheffield,02/02/16,Miss Eastwood
Sheffield,02/02/16,Sapphire Man
Sheffield,02/02/16,Swift Millican
...
Sheffield,02/02/16,Geelo Storm
Sheffield,02/02/16,Reflected Light
Sheffield,02/02/16,Boozed Flame
请注意,我在此处使用requests
,但如果您愿意,可以使用urllib2
。