我打印两个类别时无法获取所有信息

时间:2018-02-04 21:55:15

标签: python python-3.x web-scraping beautifulsoup

我已经在python中编写了一些代码来搜索一些电影名称和一些与这些电影相关的其他信息。如果我考虑单独打印这两个项目,就像在print(movie)位于 我的脚本中间部分,print(addinfo)位于底部。

然而,当我尝试在底部一起打印它们时,我只获得了具有附加信息的电影名称(附加信息是从附加到每个电影名称的链接中检索出来的。问题是大多数电影名称不包含额外链接。)

例如,如果有5个电影名称,其中只有3个有其他链接,那么当我将它们打印在一起时,我会得到三个电影名称和附加信息,而我应该打印5个电影名称。我希望打印那些没有额外信息的名字。我怎样才能解决这个问题?提前致谢。我认为网站地址和HTML信息是无关紧要的,因为代码运行良好。但是,我会粘贴完整的代码供您考虑。

脚本我尝试过:

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

URL = "https://in.bookmyshow.com/vizag/movies"

res = requests.get(URL)
soup = BeautifulSoup(res.text, 'lxml')
for item in soup.select(".card-container"):
    movie = item.select_one(".__movie-name").text.strip()

    print(movie)  ####I do not wish to print it here. I expect to print both (movie and addinfo) together

    blink = item.select_one(".book-button a")
    if blink:
        req = requests.get(urljoin(URL,blink['href']))
        soup = BeautifulSoup(req.text,"lxml")
        addinfo = ' '.join([item.select_one(".__venue-name").text.strip() for item in soup.select(".listing-info")])

        print(movie,addinfo) ####if i print both of them together then I only get those movies which have items informations as well

2 个答案:

答案 0 :(得分:1)

代码:

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

URL = "https://in.bookmyshow.com/vizag/movies"

res = requests.get(URL)
soup = BeautifulSoup(res.text, 'html.parser')
for item in soup.select(".card-container"):
    addinfo = ''
    movie = item.select_one(".__movie-name").text.strip()
    blink = item.select_one(".book-button a")
    if blink:
        req = requests.get(urljoin(URL,blink['href']))
        soup = BeautifulSoup(req.text,"lxml")
        addinfo = ' '.join([item.select_one(".__venue-name").text.strip() for item in soup.select(".listing-info")])
    print(movie, addinfo)

输出:

Tholi Prema Gokul A/C DTS: Vizag
Howrah Bridge INOX: CMR Central, Maddilapalem INOX: Varun Beach, Beach Road INOX: Vizag Chitralaya Mall Satyam A/C Dts: Gopalapatnam V Max: Vizag
Chalo Ganesh A/C Dts: Tagarapuvalasa INOX: CMR Central, Maddilapalem INOX: Varun Beach, Beach Road INOX: Vizag Chitralaya Mall Mukta A2 Cinemas: Vizag Central, Vizag Mohini Mini: Gajuwaka Mohini 70mm Dolby Atmos: Gajuwaka Narasimha a/c Dts: Gopalapatnam Sri Lakshmi Narasimha Picture Palace: Vizag Sri Venkateshwara Screen 1: Vizag Sarat Theater - 4K Dolby Atmos: Vizag
Touch Chesi Chudu INOX: CMR Central, Maddilapalem INOX: Varun Beach, Beach Road INOX: Vizag Chitralaya Mall Mukta A2 Cinemas: Vizag Central, Vizag Raja Cine Max 2K  A/c Dts: Kothavalasa Sharada 4K: Vizag Sri Rama Picture Palace: Vizag Tata Picture Palace A/c Dts: Tagarapuvalasa V Max: Vizag
Bhaagamathie INOX: CMR Central, Maddilapalem INOX: Varun Beach, Beach Road INOX: Vizag Chitralaya Mall Jagadamba 4k: Vizag Kinnera Cinema: Maddilapalem Mukta A2 Cinemas: Vizag Central, Vizag Sri Ramulamma Theatre, Thagarapuvalasa: Vizag Sri Lakshmi Narasimha Picture Palace: Vizag Shankara A/C Dts: Gopalapatnam Sri Jaya A/c Dts: Kothavalasa
Padmaavat 
Gang Gokul A/C DTS: Vizag Sri Parameswari Picture Palace: Kancharapalem
Jai Simha Mourya Theatre: Gopalapatnam Sree Leela Mahal: Vizag Saptagiri Theatre: Chittivalasa
Maze Runner: The Death Cure INOX: Varun Beach, Beach Road INOX: Vizag Chitralaya Mall Ramadevi 4K: Vizag
Jumanji: Welcome To The Jungle INOX: Vizag Chitralaya Mall
Hey Jude INOX: Varun Beach, Beach Road
Green Apple 
Sollividava 
Tagaru 
Savarakathi 
KEE 
Prema Baraha 
Befaam 
Shadow 
Rosapoo 
Aapla Manus 
Kalakalappu 2 
Kumari 21 F 
Karu 
Kirrak Party 
Gayatri 
Inttelligent 
KEY 
Downup The Exit 796 
Pad Man 
The Boy and The World 
The 15:17 to Paris 
Leera The Soulmates 
Aiyaary 
Kanam 

答案 1 :(得分:1)

如果您使用else阻止,那么另一种方法可能如下所示:

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

URL = "https://in.bookmyshow.com/vizag/movies"

res = requests.get(URL)
soup = BeautifulSoup(res.text, 'lxml')
for item in soup.select(".card-container"):
    movie = item.select_one(".__movie-name").text.strip()
    blink = item.select_one(".book-button a")

    if blink:
        req = requests.get(urljoin(URL,blink['href']))
        soup = BeautifulSoup(req.text,"lxml")
        addinfo = ' '.join([item.select_one(".__venue-name").text.strip() for item in soup.select(".listing-info")])

        print(movie,addinfo)
    else:
        print(movie)