我正在观看有关网络解析的教程。自视频创建以来,网站本身已经发生了变化,因此我不得不添加几行,现在脚本创建的csv文件有两个标题行。有人可以帮我弄清楚我需要做些什么来纠正这个问题?谢谢!这是我的代码:
import urllib
import urllib.request
from bs4 import BeautifulSoup
import os
def make_soup(url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage,"html.parser")
return soupdata
playerdatasaved = ""
soup = make_soup("https://www.basketball-reference.com/players/a/")
for record in soup.findAll('tr'):
playerdata = ""
for data in record.findAll('th'): <------ Added this line
playerdata = playerdata + "," + data.text <------ Added this line
for data in record.findAll('td'):
playerdata = playerdata + "," + data.text
if len(playerdata) != 0:
playerdatasaved = playerdatasaved + "\n" + playerdata[1:]
header = "Player, From, To, Pos, Ht, Wt, Birth Date, Colleges"
file = open(os.path.expanduser("basketball.csv"),"wb")
file.write(bytes(header, encoding="ascii", errors='ignore'))
file.write(bytes(playerdatasaved,"ascii", errors='ignore'))
csv文件标题显示以下内容:
球员从出生日期大学出生 球员从出生日期大学毕业
我试过删除文件命令中的头变量和标题,但无济于事。谢谢!
答案 0 :(得分:0)
正如我在评论中所说,您需要删除一组标题,最好是代码中的一个标题,并将其保留在网页中。只需删除以下行:
header = "Player, From, To, Pos, Ht, Wt, Birth Date, Colleges"
file.write(bytes(header, encoding="ascii", errors='ignore'))