我的以下代码(几乎)设法将每个玩家数据划分为行,列值以逗号分隔。但是,似乎玩家名称具有底层子节点,这些子节点也显示在单独的行中。我只想要名称的文字,而不是链接。此外,我的输出中重复了一些记录。任何帮助将不胜感激!我正在使用BS4和Python 3.5。这是我的代码:
import urllib
import urllib.request
from bs4 import BeautifulSoup
def make_soup(url):
page = urllib.request.urlopen(url)
soupdata = BeautifulSoup(page, "html.parser")
return soupdata
currentdata = ""
soup = make_soup("http://www.foxsports.com/soccer/stats? competition=1&season=20160&category=STANDARD&pos=0&team=0&isOpp=0&sort=3&sortOrder=0&page=0")
for record in soup.findAll('tr'):
playerdata = ""
for data in record.findAll('td'):
playerdata = playerdata + "," + data.text
currentdata = currentdata + "\n" + playerdata
print(currentdata)
答案 0 :(得分:1)
import urllib
import urllib.request
from bs4 import BeautifulSoup
def make_soup(url):
page = urllib.request.urlopen(url)
soupdata = BeautifulSoup(page, "html.parser")
return soupdata
currentdata = ""
soup = make_soup("http://www.foxsports.com/soccer/stats? competition=1&season=20160&category=STANDARD&pos=0&team=0&isOpp=0&sort=3&sortOrder=0&page=0")
for record in soup.findAll('tr', class_=False):
row = [data.get_text(',', strip=True) for data in record.findAll('td')]
print(' '.join(row))
出:
1,Sánchez, Alexis,Sánchez, A.,ARS 21 20 1786 14 7 30 72 3 0
1,Costa, Diego,Costa, D.,CHE 19 19 1681 14 5 26 57 5 0
1,Ibrahimovic, Zlatan,Ibrahimovic, Z.,MUN 20 20 1800 14 3 36 89 5 0
4,Kane, Harry,Kane, H.,TOT 16 16 1360 13 2 27 53 0 0
5,Lukaku, Romelu,Lukaku, R.,EVE 20 19 1737 12 4 28 55 3 0
5,Defoe, Jermain,Defoe, J.,SUN 21 21 1882 12 2 18 57 1 0
tr
,请使用class_=False
,这将选择没有tr
属性的class
。get_text()
可以定义分隔符。