我有这个代码使用BeautifulSoup从网站收集一些数据
import requests
from bs4 import BeautifulSoup
url = "http://hearthstone.gamepedia.com/Patches"
page = requests.get(url)
soup = BeautifulSoup(page.content,"html.parser")
variable = soup.find('div',{"id":"mw-content-text"})
variable = variable.find_all('ul')[2]
variable = variable.find('li')
variable = variable.find_all('a')[1]
print(variable.text)
输出应为:
Patch 7.0.0.15590
按此顺序,我能够找到我想要的确切标签。
为了简化它,我怎么能将它作为单行代码?
Variable = harsoup.find('div',{"id":"mw-content-text"}).find_all('ul')[2].find('li').find_all('a')[1]
我想要实现这样的目标,但它似乎以同样的方式运作。
答案 0 :(得分:0)
soup.find_all(href=re.compile(r'/Patch_'))
出:
[<a href="/Patch_7.0.0.15590" title="Patch 7.0.0.15590">Patch 7.0.0.15590</a>,
<a href="/Patch_6.2.0.15300" title="Patch 6.2.0.15300">Patch 6.2.0.15300</a>,
<a href="/Patch_6.2.0.15181" title="Patch 6.2.0.15181">Patch 6.2.0.15181</a>,
<a href="/Patch_6.1.3.14830" title="Patch 6.1.3.14830">Patch 6.1.3.14830</a>,
<a href="/Patch_6.1.1.14406" title="Patch 6.1.1.14406">Patch 6.1.1.14406</a>,
<a href="/Patch_6.0.0.13921" title="Patch 6.0.0.13921">Patch 6.0.0.13921</a>,
<a href="/Patch_5.2.2.13807" title="Patch 5.2.2.13807">Patch 5.2.2.13807</a>,
<a href="/Patch_5.2.0.13740" title="Patch 5.2.0.13740">Patch 5.2.0.13740</a>,
<a href="/Patch_5.2.0.13714" title="Patch 5.2.0.13714">Patch 5.2.0.13714</a>,
<a href="/Patch_5.2.0.13619" title="Patch 5.2.0.13619">Patch 5.2.0.13619</a>,
<a href="/Patch_5.0.0.13030" title="Patch 5.0.0.13030">Patch 5.0.0.13030</a>,
<a href="/Patch_5.0.0.12574" title="Patch 5.0.0.12574">Patch 5.0.0.12574</a>,
<a href="/Patch_4.3.0.12266" title="Patch 4.3.0.12266">Patch 4.3.0.12266</a>,
<a href="/Patch_4.2.0.12051" title="Patch 4.2.0.12051">Patch 4.2.0.12051</a>,
<a href="/Patch_4.1.0.10956" title="Patch 4.1.0.10956">Patch 4.1.0.10956</a>,
<a href="/Patch_4.0.0.10833" title="Patch 4.0.0.10833">Patch 4.0.0.10833 - The League of Explorers</a>,
<a href="/Patch_3.2.0.10604" title="Patch 3.2.0.10604">Patch 3.2.0.10604</a>,
<a href="/Patch_3.1.0.10357" title="Patch 3.1.0.10357">Patch 3.1.0.10357</a>,
<a href="/Patch_3.0.0.9786" title="Patch 3.0.0.9786">Patch 3.0.0.9786 - The Grand Tournament Draws Near</a>,
<a href="/Patch_2.8.0.9554" title="Patch 2.8.0.9554">Patch 2.8.0.9554</a>,
<a href="/Patch_2.7.0.9166" title="Patch 2.7.0.9166">Patch 2.7.0.9166</a>,
<a href="/Patch_2.6.0.8834" title="Patch 2.6.0.8834">Patch 2.6.0.8834</a>,
使用re
来存档您想要的标记。
可以在find()
或find_all()
中使用五个filters: