from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
r = requests.get('http://www.ign.com/tv/reviews')
c=r.text
# print(c)
soup = BeautifulSoup(c, 'html.parser')
x=soup.find_all('div', class_='item-title')
for item in x:
print(item)
print('--------------------------------------------------')
lobbying = {}
for element in x:
lobbying[element.a.get_text()] = {}
#print (lobbying) # This is a dictionary object
for key,value in lobbying.items():
print(key,value)
for element in x:
lobbying[element.a.get_text()]["link"] = element.a["href"]
for key,value in lobbying.items():
print(key,value, sep='\n', end='\n\n')
这是首先找到日期和分数,然后将我们在字典中找到的内容插入。
f = soup.find_all('div', class_='itemList-item')
reviewItems ={}
for item in f:
score = item.find("span", class_="scoreBox-scorePhrase").getText()
date = item.find_all("div", class_="grid_3")[1].getText().strip()
lobbying[element.a.get_text()]["score"] = score
lobbying[element.a.get_text()]["date"] = date
for key,value in lobbying.items():
print(key,value)
答案 0 :(得分:0)
我能够获得以下项目所需的信息:
f = soup.find_all('div', class_='itemList-item')
reviewItems = {}
for item in f:
review = {}
review["score"] = item.find("span", class_="scoreBox-scorePhrase").getText()
review["date"] = item.find_all("div", class_="grid_3")[1].getText().strip()
review["link"] = item.find("a", class_="scoreBox-link")["href"]
reviewItems[item.find("div", class_="item-title").getText().strip()] = review
for key, value in reviewItems.items():
print(key, value)
如果您使用的课程过于通用(如grid_3
),请尝试查找更具体的课程。在这种情况下,它是scoreBox
或itemList-item
。
你说你想从网址上获取日期,但我认为你的意思是你希望从拥有它的div
获取它。似乎是在这种特殊情况下,您可以从为每个itemList项找到的grid_3
元素中获取第二个元素。
无论如何,这将打印出以下25个项目。
Ash vs Evil Dead - Home Again {'link':'http://www.ign.com/articles/2016/12/04/ash-vs-evil-dead-home-again-review','date':'2016年12月5日','得分':'惊人'}