我正试图从Wikipedia的当前事件页面上抓取以下内容:https://en.wikipedia.org/wiki/Portal:Current_events。特别是当前日期。使用inspect元素,我可以看到我想要的所有信息都存储在id为“ 2020_June_15”的div中。在我的脚本中,我指定了特定的ID,但是我当前的脚本继续从页面中提取所有内容。我想念什么?
这是python脚本wiki.py:
import sys
import requests
import bs4
res = requests.get('https://en.wikipedia.org/wiki/Portal:Current_events')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text,"lxml")
elems = soup.select('div', {"id": "2020_June_15"})
for i in range(len(elems)):
print(elems[i].getText())
答案 0 :(得分:0)
替换汤。用汤选择。find_all
import sys
import requests
import bs4
res = requests.get('https://en.wikipedia.org/wiki/Portal:Current_events')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text,"lxml")
elems = soup.find_all('div', {"id": "2020_June_15"})
for i in range(len(elems)):
print(elems[i].getText())
答案 1 :(得分:0)
您真的很亲近。代替“选择”,使用“查找”
import sys
import requests
import bs4
res = requests.get('https://en.wikipedia.org/wiki/Portal:Current_events')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text,"lxml")
elems = soup.find('div', {"id": "2020_June_15"})
for i in range(len(elems)):
print(elems[i].getText())