如何使用python抓取基于网站标题的网站?

时间:2019-05-04 16:44:06

标签: python http screen-scraping

我正在为包含特定标题的网站抓取网站。 我将如何制作它,例如,检查“ example.com/xxxxxxxxxx”,其中“ x”是否为标题为404的随机数?

1 个答案:

答案 0 :(得分:0)

这会找到页面标题:

import requests
from lxml.html import fromstring

def Get_PageTitle(url):
    req = requests.get(url)
    tree = fromstring(req.content)
    title = tree.findtext('.//title')
    return title


url = "http://www.google.com"
title = Get_PageTitle(url)

if "404" in title:
    #title has 404
    print("Title has 404 in it")

else:
    #no 404 in title
    pass

编辑:

上面的代码检查标题中是否有404 。如果您想知道标题是否为404,请使用以下代码:

import requests
from lxml.html import fromstring

def Get_PageTitle(url):
    req = requests.get(url)
    tree = fromstring(req.content)
    title = tree.findtext('.//title')
    return title


url = "http://www.google.com"
title = Get_PageTitle(url)

if "404" is title:
    #title is 404
    print("Title is 404 in it")
    print(title)

else:
    #title is not 404
    pass

{{3}}