使用python的书我的节目刮擦座位布局页面

时间:2018-02-04 08:38:06

标签: python session web-scraping beautifulsoup python-requests

我正在努力搜索bookmyshow网站,以查找电影详情,例如在何时可获得门票以及有多少座位可用。我必须找到如何获得座位可用的节目时间,但现在我希望在该节目中获得总座位数。我的代码是:

    import requests
from bs4 import BeautifulSoup
import json
base_url = "https://in.bookmyshow.com"
s =requests.session()
headers = {"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"}
r = s.get("https://in.bookmyshow.com/vizag/movies", headers = headers)
print(r.status_code)
soup = BeautifulSoup(r.text,"html.parser")
movies_list = soup.find("div",{"class":"__col-now-showing"})
movies = movies_list.findAll("a",{"class":"__movie-name"})
for movie in movies:
    print(movie.text)
show = []
containers = movies_list.findAll("div",{"class":"card-container"})
for container in containers:
    try:
        detail = container.find("div",{"class":"__name overflowEllipses"})
        button = container.find("div",{"class":"book-button"})
        print(detail.text)
        print(button.a["href"])
        url_ticket = base_url + button.a["href"]
        show.append(url_ticket)
    except:
        pass
for i in show:
    print(i)
for t in show:
    res = s.get(t,headers=headers)
    bs = BeautifulSoup(res.text,"html.parser")
    movie_name = bs.find("div",{"class":"cinema-name-wrapper"})
    print(movie_name.text.replace(" ","").replace("\t","").replace("\n",""))
    venue_list = bs.find("ul",{"id":"venuelist"})
    venue_names = venue_list.findAll("li",{"class":"list"})
    try:
        for i in venue_names:
            vn = i.find("div",{"class":"__name"})
            print(vn.text.replace(" ","").replace("\t","").replace("\n",""))
            show_times = i.findAll("div",{"data-online":"Y"})
            for st in show_times:
                print(st.text.replace(" ","").replace("\t","").replace("\n",""))
    except:
        pass

    print("\n")
heads = {
    "accept":"*/*",
"accept-encoding":"gzip, deflate, br",
"accept-language":"en-US,en;q=0.9",
"origin":"https://in.bookmyshow.com",
"referer":"https://in.bookmyshow.com/buytickets/chalo-vizag/movie-viza-ET00064364-MT/20180204",
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
}
rr = s.post("https://b-eu.simility.com/b?c=bookmyshow&v=1.905&ec=BLOFaZ2HdToCxwcr&cl=0&si=5a76bfce6ae4a00027767ae9&sc=3B0CB9F4-4A27-4588-9FB4-A2A2760569BC&uc=D834EDA4-57E4-4889-A34F-473AC6BBDDBB&e=Seatlayout&cd=.simility.com&r=0&st=1517731803171&s=792a6c66313a2032223133302633343a2c393a322e3c202422636e312a382037633f3c606669673e61653e6338323230353f3c35616f3b2a2c2269663a203820606765696d7371606f77282e2a61663320327e70756f2e2a63643e20326c776e6e242861643f20326e75666e24206166342a306c75666e2422636e352a386c776e64262073692032223348324b403b4436253e43323d2f3c3538322f314440362f493843323d3438353633404b202e20776b2838224e3a3b34454e433c2f3735473c273638323b2541333e4425363531434b3c40424e464a422226206a66303120326c636c79672422626e303a203864636479672c28716c32342838253131322e2a7966323f203231353b353f31333a323b3b353326207b643428382a32202e207b6e302230767a756526207b663420382a6f6c2d5f512a2c2279663f203859206d642f5559202422656420552e2071663028383026207b6431392032204f6d7861666e6125372630202255616c666d757b2a4c542a33382e3031225f6b6c3436332a7a363e2b2841707a6e6d55676049617e2d3539352633362a2a434a564f4e242a6e6961672847656969672b22416a7a656f6525343b2e3024313a313b2c333b3822536b6469726925373b352c31342a2620736e3338223a2855616c313020242871643b362a3a224d6d67656e67224164612e282e2a73643b342a383a3036242871643b352a3a313f313e2e2071663932203a32343c2c227966393b2038333d39342c28716c323028383a362e20716c38332230303c2c22686639362038767a7f672c28606c313628383b2e206066393d203a282f3a30303f363c353a3a332a2620626e3330223a282024207565332a3076727f672422776d302a385920756d68656c282e2a65787a677a6b6f676c7c6b6e2d7d676a676c285f24207565342a3020576f60436974282e2a756535203228556568496174205d676a454e202e2a7d65323d203274727f6724207565312a30202d3b333c3833323a31333a202e2a7a66312838535b226b72786e6b61637c636d6e257a25676f656564672f616a7a656f6527726c66222620616c766770666b6e2d7a666e2d7663677f6770202e2a496a72656f6d20504e4428526e77656164202c6477646c5d26592a6372726e61696374636d662f706e642a2e206f6a626c606d6e656b666a68607863676d68676c6d6865676e67696f6a62636b202e2a496a72656f6d20504e4428546b67756d78202c6477646c5d26592a6372726e61696374636d662f78276c69616e2e63787a6e6969637c696f642d702f726c636b66202c286b667465786c696e2f6c636b662f7066776f696e282e2a4c63766b7e6f2243666b6d6e74282e66776e6e5f245120617a726469636b76616d6c2d7a257a72617a6b2577696e677e6b6c672f6b6e6f2226207f69646f74616c676166656b66617a766d722e6e6e64202e2055616e6776636c6d2043656c7c676c76224c6f617273727c696f6422456d66776e6d282e223b2c3c2e38243338303b205f5577",headers =heads) # i got the link while i was inspecting the booking tickets page
f = s.get("https://in.bookmyshow.com/buytickets/chalo-vizag/movie-viza-ET00064364-MT/20180204#!seatlayout") # this is the page gets displayed when we click the show time
ff = f.text
j = json.loads(ff)
print(j)

获得此页面的源代码后,我可以轻松获得座位。但我无法得到那个页面。这该怎么做?在此先感谢!

1 个答案:

答案 0 :(得分:0)

<强>步骤: 1)使用selenium点击显示块的时间

driver.find_element_by_xpath('<enter xpath>').click()

使用inspect元素查找xpath,然后单击元素然后复制,您将获得copy xpath

的选项

time.sleep(4)#等待4秒钟以显示页面

2)使用

获取html源代码
html = driver.page_source

然后使用美丽的汤来废弃页面

soup = BeautifulSoup(html,'html.parser')

查找包含a href的所有class ='_available'代码并对其进行计数 找到包含a href的所有class = '_blocked'标记并对其进行计数 使用这些数据,您可以找到完全没有座位和可用座位