我正在提取正在爬网的IMDB用户评论,
我需要一个用户的电影明星评分
和电影标题,用户的内容。
所以我将它保存在三个数组中。
标题和内容没问题
但是,[星级]缺失了一部分。
那是因为没有属性
缺少属性(star_score)部分
在星阵中
我想添加0或null。
答案 0 :(得分:0)
并非所有评论都会有评分,因此您需要考虑到这一点:
$ python3 test.py https://www.imdb.com/title/tt5113040/reviews
Got response: 200
Title: The Secret Life of Pets 2
# (8/10) Not as bad as some reviews on here
Let's get this straight it a film made for childre...
-----
ddriver385, 26 May 2019
# (7/10) A Good Film for the kids
This film is a good film to watch with the kids. C...
-----
xxharriet_hobbsxx, 27 May 2019
# (7/10) Worth a watch
Admittedly, it probably wasn't necessary to follow...
-----
MythoGenesis, 24 May 2019
# (No rating) Intense and entertaining
Narratively, the film is not without fault. In par...
-----
TheBigSick, 26 May 2019
...
test.py
import requests
import sys
import time
from bs4 import BeautifulSoup
def fetch(url):
with requests.Session() as s:
r = s.get(url, timeout=5)
return r
def main(url):
start_t = time.time()
resp = fetch(url)
print(f'Got response: {resp.status_code}')
html = resp.content
bs = BeautifulSoup(html, 'html.parser')
title = bs.find('h3', attrs={'itemprop': 'name'})
print(f'Title: {title.a.text}')
reviews = bs.find_all('div', class_='review-container')
for review in reviews:
title = review.find('a', class_='title').text.strip()
rating = review.find('span', class_='rating-other-user-rating')
if rating:
rating = ''.join(i.text for i in rating.find_all('span'))
rating = rating if rating else 'No rating'
user = review.find('span', class_='display-name-link').text
date = review.find('span', class_='review-date').text
content = review.find('div', class_='content').div.text
print(
f'# ({rating}) {title}\n'
f'{content[:50]}...\n'
f'{"-" * 5}\n'
f'{user}, {date}\n'
)
end_t = time.time()
elapsed_t = end_t - start_t
r_time = resp.elapsed.total_seconds()
print(f'Total: {elapsed_t:.2f}s, request: {r_time:.2f}s')
if __name__ == '__main__':
if len(sys.argv) > 1:
url = sys.argv[1]
main(url)
else:
print('URL is required.')
sys.exit(1)