现在我想从网页上搜索keywords
元和description
,如下所示:
<html>
<head>
<title>test page</title>
<meta name="keywords" content="A,B,C">
<meta name="description" content="the description a page">
....
今天我谷歌,但不知道,请给我一些建议。
答案 0 :(得分:0)
你甚至不需要scrapy来做这件事。您可以使用标准库类HTMLParser
来完成。
#!/usr/bin/python3
try:
from html.parser import HTMLParser
except ImportError:
import HTMLParser
class MyHTMLParse(HTMLParser):
TAG = "meta"
NAMES = ['keywords', 'description']
def __init__(self):
HTMLParser.__init__(self)
self.contents = {}
def handle_starttag(self, tag, attrs):
if tag == MyHTMLParse.TAG:
attributes = {i[0] : i[1] for i in attrs}
if attributes.get("name", None) in MyHTMLParse.NAMES:
self.contents[attributes["name"]] = attributes["content"]
parser = MyHTMLParse()
# Feed parser the website with parser.feed(), then access the information with
# parser.contents as a dictionary with keys "keywords" and "description"