Question

以下是我目前的代码：

from bs4 import BeautifulSoup

import requests

header = {'User-agent' : 'Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5'}

url  = requests.get("https://d1baseball.com/scores/?date=20170407&c=PKzxg", headers = header).text

soup = BeautifulSoup(url, 'html.parser')

boxscores = soup.find_all('a', text = 'Box Score')

for eachboxscore in boxscores:
    links = eachboxscore.get('href')
    print(links)
    url = requests.get(links, headers = header).text
    soup = BeautifulSoup(url, 'html.parser')
    pbp = soup.find_all('a', text = {'Play-By-Play' or 'Play by Play' or 'Play By Play' or 'Play-by-Play'})
    print(pbp)
    for eachpbp in pbp:
        button = eachpbp.get('href')
        print(button)

我不确定我是否在这里正确实现逻辑运算符'或'。我希望能够在html中搜索'a'标签（对于for循环中的所有url - 在这种情况下有6个不同的链接）并找到Play-By-Play字符串的位置以找到相应的链接播放数据（注意，有时播放数据的链接只是另一个网址，而有时它可能只是重定向到同一网页上的另一个位置，例如：#play-by-play ）。

然后会有一个快速的后续问题，如何“点击”此重定向链接？或者将它添加到我已经找到的网址的末尾会更容易吗？

提前感谢您的帮助！

Answer 1

text=['Play-By-Play', 'Play by Play', 'Play By Play', 'Play-by-Play']

另见：

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-string-argument

字符串参数使用字符串，您可以搜索字符串而不是标签。与name和关键字参数一样，您可以传入字符串，正则表达式，列表，函数或值True。以下是一些例子：

soup.find_all(string="Elsie")

[U＆＃39;杜＆＃39;]

soup.find_all(string=["Tillie", "Elsie", "Lacie"])

[你＆＃39; Elsie＆＃39;，你＆＃39; Lacie＆＃39;，你＆＃39; Tillie＆＃39;]

soup.find_all(string=re.compile("Dormouse"))    [u＆＃34;睡鼠的故事＆＃34;，   你＆＃34;睡鼠的故事＆＃34;]

def is_the_only_string_within_a_tag(s): """Return True if this string is the only child of its parent tag.""" return (s == s.parent.string)

soup.find_all(string=is_the_only_string_within_a_tag)

[你＆＃34;睡鼠的故事＆＃34;，你＆＃34;睡鼠的故事＆＃34;，你＆＃39; Elsie＆＃39;，你＆＃39; Lacie＆＃39; ;，你＆＃39; Tillie＆＃39;，你＆＃39; ...＆＃39;]

尽管string用于查找字符串，但您可以将其与查找标记的参数结合使用：Beautiful Soup将查找其.string与您的字符串值匹配的所有标记。此代码查找.string为“Elsie”的标记：

soup.find_all("a", string="Elsie")

[Elsie]

字符串参数是Beautiful Soup 4.4.0中的新参数。在早期版本中，它被称为文本：

soup.find_all("a", text="Elsie")

[杜]

BeautifulSoup - 使用find_all（）在HTML中查找特定字符串

1 个答案: