我发现了一个很酷的python脚本,可以从NFL rosters中删除玩家信息。但是我想将NFL Combine结果添加到数据中。我在下面为一位玩家添加了一个例子。
import urllib.request
from bs4 import BeautifulSoup
URL2 = 'www.nfl.com/player/deandrewwhite/2552657/combine'
soupCombine = BeautifulSoup(urllib.request.urlopen(URL2))
Combinestats = soupCombine.find_all("div", attrs = {"class": "tp-title"})
Combinestats[0].contents
产地:
['3 Cone Drill', < span class="tp-results">6.97 secs< /span>]
如何从Combinestats [0] .contents中获取以下内容?
DrillName = '3 Cone Drill'
DrillResult = 6.97
以下是Combinestats中的项目供参考。
for ii in range(len(Combinestats)):
print(Combinestats[ii].contents)
['3 Cone Drill', <span class="tp-results">6.97 secs</span>]
['40 Yard Dash', <span class="tp-results">4.44 Secs</span>]
['Broad Jump', <span class="tp-results">118.0 inches</span>]
['20 Yard Shuttle', <span class="tp-results">4.18 secs</span>]
['Vertical Jump', <span class="tp-results">34.5 inches</span>]
答案 0 :(得分:4)
只需使用列表理解。
resultSet = soup.find_all("div", attrs = {"class": "tp-title"})
stats = [
(i.contents[0], i.contents[1].text) for i in resultSet
]
或者,for
循环。
stats = []
for i in resultSet:
stats.append(i.contents[0], i.contents[1].text)
print(stats)
[
('40 Yard Dash', '4.44 Secs'),
('3 Cone Drill', '6.97 secs'),
('Broad Jump', '118.0 inches'),
('20 Yard Shuttle', '4.18 secs'),
('Vertical Jump', '34.5 inches')
]
答案 1 :(得分:1)
这是另一种做同样事情的方法。但是,看起来有点尴尬。
import requests
from bs4 import BeautifulSoup
URL = "http://www.nfl.com/player/deandrewwhite/2552657/combine"
res = requests.get(URL)
soup = BeautifulSoup(res.text,"lxml")
items = {item.select_one(".tp-results").previous_sibling:item.select_one(".tp-results").text for item in soup.select(".tp-title")}
print(items)
输出:
{'3 Cone Drill': '6.97 secs', '20 Yard Shuttle': '4.18 secs', '40 Yard Dash': '4.44 Secs', 'Vertical Jump': '34.5 inches', 'Broad Jump': '118.0 inches'}