以下脚本是from this site。它目前不起作用,但我已经使它在我自己的计算机上工作(目前无法访问)。但是,我真正想要的是利用此脚本返回元组(self.tomatometer, self.audience)
(查看函数def _process(self)
)。
我想要做的是将此脚本传递给一个电影标题列表(在for
循环中)并让它将self.tomatometer
和self.audience
变量返回给调用者。
我设法做到了这一点,但它似乎没有人推荐和错综复杂:假设我称这个脚本为convrt.py
,这就是我所做的:
import convrt
# this is what I'm doing, it's working, but seems weird.
convrt.RottenTomatoesRating("Movie Title Here")._process()
PyCharm警告我,我正在访问一个类的私有方法。我知道Python中没有任何私有,这就是所谓的“名称修改”,但我仍然认为这可能不是使用此脚本返回元组的最佳方法?
原始剧本:
#!/usr/bin/env python
# RottenTomatoesRating
# Laszlo Szathmary, 2011 (jabba.laci@gmail.com)
from BeautifulSoup import BeautifulSoup
import sys
import re
import urllib
import urlparse
class MyOpener(urllib.FancyURLopener):
version = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.15) Gecko/20110303 Firefox/3.6.15'
class RottenTomatoesRating:
# title of the movie
title = None
# RT URL of the movie
url = None
# RT tomatometer rating of the movie
tomatometer = None
# RT audience rating of the movie
audience = None
# Did we find a result?
found = False
# for fetching webpages
myopener = MyOpener()
# Should we search and take the first hit?
search = True
# constant
BASE_URL = 'http://www.rottentomatoes.com'
SEARCH_URL = '%s/search/full_search.php?search=' % BASE_URL
def __init__(self, title, search=True):
self.title = title
self.search = search
self._process()
def _search_movie(self):
movie_url = ""
url = self.SEARCH_URL + self.title
page = self.myopener.open(url)
result = re.search(r'(/m/.*)', page.geturl())
if result:
# if we are redirected
movie_url = result.group(1)
else:
# if we get a search list
soup = BeautifulSoup(page.read())
ul = soup.find('ul', {'id' : 'movie_results_ul'})
if ul:
div = ul.find('div', {'class' : 'media_block_content'})
if div:
movie_url = div.find('a', href=True)['href']
return urlparse.urljoin( self.BASE_URL, movie_url )
def _process(self):
if not self.search:
movie = '_'.join(self.title.split())
url = "%s/m/%s" % (self.BASE_URL, movie)
soup = BeautifulSoup(self.myopener.open(url).read())
if soup.find('title').contents[0] == "Page Not Found":
url = self._search_movie()
else:
url = self._search_movie()
try:
self.url = url
soup = BeautifulSoup( self.myopener.open(url).read() )
self.title = soup.find('meta', {'property' : 'og:title'})['content']
if self.title: self.found = True
self.tomatometer = soup.find('span', {'id' : 'all-critics-meter'}).contents[0]
self.audience = soup.find('span', {'class' : 'meter popcorn numeric '}).contents[0]
if self.tomatometer.isdigit():
self.tomatometer += "%"
if self.audience.isdigit():
self.audience += "%"
except:
pass
if __name__ == "__main__":
if len(sys.argv) == 1:
print "Usage: %s 'Movie title'" % (sys.argv[0])
else:
rt = RottenTomatoesRating(sys.argv[1])
if rt.found:
print rt.url
print rt.title
print rt.tomatometer
print rt.audience
答案 0 :(得分:2)
我不相信你应该以这种方式做事。
_process()
以_
为前缀,因为它应该是私有类方法,因为PyCharm警告过你。这意味着它只应该在类本身中使用,而不是由你使用。
您正在使用电影标题初始化RottenTomatoesRating
类的实例,然后在该实例上调用._process()
。当您调用RottenTomatoesRating
类的构造函数 - RottenTomatoesRating(movie_title)
时 - 它会执行该类的__init__()
方法,并将您的电影标题作为title
参数传入。 __init__()
方法还会调用self._process()
,从而为self.tomatometer
和self.audience
中的每一个分配值(如果可用)。然后,您可以直接访问这些值:
import convrt
ratings = convrt.RottenTomatoesRating("Movie Title Here")
tomatometer = ratings.tomatometer
audience = ratings.audience