从element.ResultSet中提取项目

时间:2018-03-23 16:11:51

标签: python python-3.x web-scraping beautifulsoup

我发现了一个很酷的python脚本,可以从NFL rosters中删除玩家信息。但是我想将NFL Combine结果添加到数据中。我在下面为一位玩家添加了一个例子。

import urllib.request
from bs4 import BeautifulSoup

URL2 = 'www.nfl.com/player/deandrewwhite/2552657/combine'
soupCombine = BeautifulSoup(urllib.request.urlopen(URL2))
Combinestats = soupCombine.find_all("div", attrs = {"class": "tp-title"})
Combinestats[0].contents

产地:

['3 Cone Drill', < span class="tp-results">6.97 secs< /span>]

如何从Combinestats [0] .contents中获取以下内容?

DrillName = '3 Cone Drill'

DrillResult = 6.97

以下是Combinestats中的项目供参考。

for ii in range(len(Combinestats)):
     print(Combinestats[ii].contents)

['3 Cone Drill', <span class="tp-results">6.97 secs</span>]
['40 Yard Dash', <span class="tp-results">4.44 Secs</span>]
['Broad Jump', <span class="tp-results">118.0 inches</span>]
['20 Yard Shuttle', <span class="tp-results">4.18 secs</span>]
['Vertical Jump', <span class="tp-results">34.5 inches</span>]

2 个答案:

答案 0 :(得分:4)

只需使用列表理解。

resultSet = soup.find_all("div", attrs = {"class": "tp-title"})
stats = [
    (i.contents[0], i.contents[1].text) for i in resultSet

]

或者,for循环。

stats = []
for i in resultSet:
    stats.append(i.contents[0], i.contents[1].text)

print(stats)
[
    ('40 Yard Dash', '4.44 Secs'),
    ('3 Cone Drill', '6.97 secs'),
    ('Broad Jump', '118.0 inches'),
    ('20 Yard Shuttle', '4.18 secs'),
    ('Vertical Jump', '34.5 inches')
]

答案 1 :(得分:1)

这是另一种做同样事情的方法。但是,看起来有点尴尬。

import requests
from bs4 import BeautifulSoup

URL = "http://www.nfl.com/player/deandrewwhite/2552657/combine"
res = requests.get(URL)
soup = BeautifulSoup(res.text,"lxml")
items = {item.select_one(".tp-results").previous_sibling:item.select_one(".tp-results").text for item in soup.select(".tp-title")}
print(items)

输出:

{'3 Cone Drill': '6.97 secs', '20 Yard Shuttle': '4.18 secs', '40 Yard Dash': '4.44 Secs', 'Vertical Jump': '34.5 inches', 'Broad Jump': '118.0 inches'}