Question

我正在用Python创建一个程序来搜索电视节目/电影，而从IMDb中，它会给你：

电影的标题，年份，评级，年龄评级和故事梗概。

我想使用 no 外部模块，只使用Python 3.4附带的模块。

我知道我必须使用urllib，但我不知道从哪里去。

我该怎么做？

Answer 1

这是一个from here的例子：

import json
from urllib.parse import quote
from urllib.request import urlopen

def search(title):
    API_URL = "http://www.omdbapi.com/?r=json&s=%s"
    title = title.encode("utf-8")
    url = API_URL % quote(title)
    data = urlopen(url).read().decode("utf-8")
    data = json.loads(data)
    if data.get("Response") == "False":
        print(data.get("Error", "Unknown error"))

    return data.get("Search", [])

然后你可以这样做：

>>> search("Idiocracy")
[{'Year': '2006', 'imdbID': 'tt0387808', 'Title': 'Idiocracy'}]

Answer 2

这可能太复杂但是：我看一下网页代码。我查看我想要的信息，然后我提取信息。

    import urllib.request

def search(title):
    html = urllib.request.urlopen("http://www.imdb.com/find?q="+title).read().decode("utf-8")
    f=html.find("<td class=\"result_text\"> <a href=\"",0)+34
    openlink=""
    while html[f]!="\"":
        openlink+= html[f]
        f+=1
    html = urllib.request.urlopen("http://www.imdb.com"+openlink).read().decode("utf-8")
    f = html.find("<meta property='og:title' content=\"",0)+35
    titleyear=""
    while html[f] !="\"":
        titleyear+=html[f]
        f+=1

    f = html.find("title=\"Users rated this ",0)+24
    rating = ""
    while html[f] !="/":   
        rating+= html[f]
        f+=1

    f=html.find("<meta name=\"description\" content=\"",0)+34
    shortdescription = ""
    while html[f] !="\"":
        shortdescription+=html[f]
        f+=1
    print (titleyear,rating,shortdescription)
    return (titleyear,rating,shortdescription)
search("friends")

添加到f的数字必须恰到好处，你要计算你正在搜索的字符串的长度，因为find（）会返回字符串中第一个字母的位置。

看起来很糟糕，还有其他更简单的方法吗？

在没有外部模块的情况下在网站上查找信息

2 个答案: