Question

对于我的任务，我试图从以下网站上抓取信息：https://www.blueroomcinebar.com/movies/now-showing/。

我的代码需要查找电影名称，时间和海报。电影时间和海报都按照我在HTML中显示的顺序显示在我创建的列表中，但是名称似乎按字母顺序排列。

我们不允许使用BeautifulSoup

这是我当前用于抓取影片的代码：

$> python mnist_tpu.py --use_tpu=false --master=''

当前，名称在列表中的顺序为

from re import findall, finditer, MULTILINE, DOTALL
from urllib.request import urlopen

movies_name = []
movies_times = []
movies_image = []

movies_list = []

movies_page = urlopen("https://www.blueroomcinebar.com/movies/now-showing/").read().decode('utf-8')

#Add movies to Movies at Blue Room Screen
find_movie_names = findall(r'<h1>(.*?)</h1>', movies_page)
find_movie_times = findall(r'<p>([0-9]{1,2}:[0-9]{2} AM|PM)</p>', movies_page)
find_movie_image = findall(r'<div class="poster" style="background-image: url\((.*?)\)">', movies_page)

print(find_movie_names)
#Add movies to arrays
for movie in find_movie_names:
    movies_name.append(movie)
for movie in find_movie_times:
    movies_times.append(movie)
for movie in find_movie_image:
    movies_image.append(movie)

print(movies_name)
print(movies_image)

for movie in range(len(movies_name)):
    movies_list.append("{};{};{}".format(movies_name[movie], movies_times[movie], movies_image[movie - 1]))

它们应按以下顺序排列：

['Aladdin', 'Avengers: Endgame', 'Chandigarh Amritsar Chandigarh', 'John Wick - Parabellum', 'Long Shot', 'Pokemon Detective Pikachu', 'Poms', 'The Hustle', 'Top End Wedding']

N.P。可能有第二部OCAP上映的电影。我不是100％知道为什么会这样，但这似乎是每天都会播放不同电影的某种特殊放映。

为什么我的正则表达式按字母顺序显示？

0 个答案: