从http://www.ign.com/tv/reviews

时间:2016-12-14 03:28:10

标签: python html web web-scraping beautifulsoup

我试图达到以下结果: enter image description here 我的代码,目前无效,附加

from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests

r = requests.get('http://www.ign.com/tv/reviews')
c=r.text
# print(c)
soup = BeautifulSoup(c, 'html.parser')

x=soup.find_all('div', class_='item-title')
for item in x:
    print(item)
    print('--------------------------------------------------')
lobbying = {} 
for element in x:
    lobbying[element.a.get_text()] = {}

#print (lobbying)  # This is a dictionary object
for key,value in lobbying.items(): 
    print(key,value)
for element in x:
    lobbying[element.a.get_text()]["link"] = element.a["href"]

for key,value in lobbying.items(): 
    print(key,value, sep='\n', end='\n\n')

这是首先找到日期和分数,然后将我们在字典中找到的内容插入。

f = soup.find_all('div', class_='itemList-item')
reviewItems ={}

for item in f:
    score = item.find("span", class_="scoreBox-scorePhrase").getText()
    date = item.find_all("div", class_="grid_3")[1].getText().strip()
    lobbying[element.a.get_text()]["score"] = score
    lobbying[element.a.get_text()]["date"] = date

for key,value in lobbying.items(): 
    print(key,value)

1 个答案:

答案 0 :(得分:0)

我能够获得以下项目所需的信息:

f = soup.find_all('div', class_='itemList-item')

reviewItems = {}

for item in f:
    review = {}
    review["score"] = item.find("span", class_="scoreBox-scorePhrase").getText()
    review["date"] = item.find_all("div", class_="grid_3")[1].getText().strip()
    review["link"] = item.find("a", class_="scoreBox-link")["href"]
    reviewItems[item.find("div", class_="item-title").getText().strip()] = review

for key, value in reviewItems.items():
    print(key, value)

如果您使用的课​​程过于通用(如grid_3),请尝试查找更具体的课程。在这种情况下,它是scoreBoxitemList-item

你说你想从网址上获取日期,但我认为你的意思是你希望从拥有它的div获取它。似乎是在这种特殊情况下,您可以从为每个itemList项找到的grid_3元素中获取第二个元素。

无论如何,这将打印出以下25个项目。

  

Ash vs Evil Dead - Home Again {'link':'http://www.ign.com/articles/2016/12/04/ash-vs-evil-dead-home-again-review','date':'2016年12月5日','得分':'惊人'}