BeautifulSoup没有拿起元标记

时间:2018-04-20 21:35:59

标签: python beautifulsoup meta

我有一个简单的脚本,它获取一个html页面并尝试输出关键字元标记的内容。不知何故,即使通过html包含标记,它也不会获取关键字元标记的内容。任何帮助表示赞赏。

    url = “https://www.mediapost.com/publications/article/316086/google-facebook-others-pitch-in-app-ads-brand-s.html”
    req = urllib2.Request(url=url)
    f = urllib2.urlopen(req)
    mycontent = f.read()
    soup = BeautifulSoup(mycontent, 'html.parser')
    keywords = soup.find("meta", property="keywords")
    print keywords

3 个答案:

答案 0 :(得分:1)

我强烈推荐你requests

<强>代码:

from bs4 import BeautifulSoup
import requests

r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
keywords = soup.select_one('meta[name="keywords"]')['content']
>>> keywords
'Many more major brands are pumping big ad dollars into mobile games, pushing Google, Facebook and others into the in-app gaming ad space. Some believe this is in response to brands searching for a secure, safe place to run video ads and engage with consumers. 03/16/2018'

答案 1 :(得分:0)

使用'lxml'代替'html.parser'并使用soup.find_all

soup = BeautifulSoup(doc, 'lxml')
keywords = soup.find_all('meta',attrs={"name": 'keywords'})
for x in keywords:
    print(x['content'])

输出

Many more major brands are pumping big ad dollars into mobile games, pushing Google, Facebook and others into the in-app gaming ad space. Some believe this is in response to brands searching for a secure, safe place to run video ads and engage with consumers. 03/16/2018

答案 2 :(得分:0)

如果您检查正确,您要查找的元标记的属性名称不是属性,请将代码更改为

keywords = soup.find("meta", attrs={'name':'keywords'})

然后显示您需要写的内容

print keywords['content']

输出:

  

更多主要品牌正在为手机游戏注入大量广告资金,   推动谷歌,Facebook和其他人进入应用内游戏广告领域。   有些人认为这是对寻求安全的品牌的回应,   安全的地方运行视频广告和与消费者互动。 2018年3月16日