使用漂亮的汤从HTML抓取多个数据标签

时间:2018-07-06 16:54:42

标签: python html css beautifulsoup python-requests

我正在尝试抓取HTML以创建一个包含投手姓名和他的惯用拳法的字典。数据标签被掩埋了-到目前为止,我只能从数据集中收集投手的名字。 HTML输出(针对每个播放器)如下:

<div class="pitcher players">
<input name="import-data" type="hidden" value="%5B%7B%22slate_id%22%3A20190%2C%22type%22%3A%22classic%22%2C%22player_id%22%3A%2210893103%22%2C%22salary%22%3A%2211800%22%2C%22position%22%3A%22SP%22%2C%22fpts%22%3A14.96%7D%2C%7B%22slate_id%22%3A20192%2C%22type%22%3A%22classic%22%2C%22player_id%22%3A%2210894893%22%2C%22salary%22%3A%2211800%22%2C%22position%22%3A%22SP%22%2C%22fpts%22%3A14.96%7D%2C%7B%22slate_id%22%3A20193%2C%22type%22%3A%22classic%22%2C%22player_id%22%3A%2210895115%22%2C%22salary%22%3A%2211800%22%2C%22position%22%3A%22SP%22%2C%22fpts%22%3A14.96%7D%5D"/>
<a class="player-popup" data-url="https://rotogrinders.com/players/johnny-cueto-11193?site=draftkings" href="https://rotogrinders.com/players/johnny-cueto-11193">Johnny Cueto</a>
<span class="meta stats">
<span class="stats">
            R
        </span>
<span class="salary" data-role="salary" data-salary="$11.8K">
            $11.8K
        </span>
<span class="fpts" data-fpts="14.96" data-product="56" data-role="authorize" title="Projected Points">14.96</span>

我已经弯腰了,空了出来-我敢肯定我想得太多了。这是我到目前为止的代码:

import requests
from bs4 import BeautifulSoup

url = "https://rotogrinders.com/lineups/mlb?site=draftkings"

r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "html.parser")

players_confirmed = {}
results = [soup.find_all("div", {'class':'pitcher players'}]

遍历结果集以获得所需的更细粒度的数据标签信息的最佳方法是什么?

我需要HTML中以开头的文本,以及标记中的handed-ness 理想情况下,我将拥有一个包含以下内容的字典:

{Johnny Cueto:R,玩家2:L,...}

1 个答案:

答案 0 :(得分:0)

import requests
from bs4 import BeautifulSoup
url = "https://rotogrinders.com/lineups/mlb?site=draftkings"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "html.parser")
players_confirmed = {}
results = soup.find_all("div", {'class': 'pitcher players'})
dicti={}
for j in results:
    dicti[j.a.text]=j.select(".stats")[1].text.strip("\n").strip()  

只需使用所创建元素的选择或查找功能,您就可以进行迭代