Question

我试图达到以下结果：我的代码，目前无效，附加

from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests

r = requests.get('http://www.ign.com/tv/reviews')
c=r.text
# print(c)
soup = BeautifulSoup(c, 'html.parser')

x=soup.find_all('div', class_='item-title')
for item in x:
    print(item)
    print('--------------------------------------------------')
lobbying = {} 
for element in x:
    lobbying[element.a.get_text()] = {}

#print (lobbying)  # This is a dictionary object
for key,value in lobbying.items(): 
    print(key,value)
for element in x:
    lobbying[element.a.get_text()]["link"] = element.a["href"]

for key,value in lobbying.items(): 
    print(key,value, sep='\n', end='\n\n')

这是首先找到日期和分数，然后将我们在字典中找到的内容插入。

f = soup.find_all('div', class_='itemList-item')
reviewItems ={}

for item in f:
    score = item.find("span", class_="scoreBox-scorePhrase").getText()
    date = item.find_all("div", class_="grid_3")[1].getText().strip()
    lobbying[element.a.get_text()]["score"] = score
    lobbying[element.a.get_text()]["date"] = date

for key,value in lobbying.items(): 
    print(key,value)

Answer 1

我能够获得以下项目所需的信息：

f = soup.find_all('div', class_='itemList-item')

reviewItems = {}

for item in f:
    review = {}
    review["score"] = item.find("span", class_="scoreBox-scorePhrase").getText()
    review["date"] = item.find_all("div", class_="grid_3")[1].getText().strip()
    review["link"] = item.find("a", class_="scoreBox-link")["href"]
    reviewItems[item.find("div", class_="item-title").getText().strip()] = review

for key, value in reviewItems.items():
    print(key, value)

如果您使用的课程过于通用（如grid_3），请尝试查找更具体的课程。在这种情况下，它是scoreBox或itemList-item。

你说你想从网址上获取日期，但我认为你的意思是你希望从拥有它的div获取它。似乎是在这种特殊情况下，您可以从为每个itemList项找到的grid_3元素中获取第二个元素。

无论如何，这将打印出以下25个项目。

Ash vs Evil Dead - Home Again {'link'：'http://www.ign.com/articles/2016/12/04/ash-vs-evil-dead-home-again-review'，'date'：'2016年12月5日'，'得分'：'惊人'}

从http://www.ign.com/tv/reviews

1 个答案: