当我使用urllib2来爬行wibsite时,但没有标签,例如html,body

时间:2017-05-10 01:59:09

标签: python urllib2 labels

import urllib2

url = 'http://www.bilibili.com/video/av1669338'

user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"

headers={"User-Agent":user_agent}

request=urllib2.Request(url,headers=headers)

response=urllib2.urlopen(request)

text = response.read()

text[:100]

' \ X1F \ x8b \ X08 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X03 \ XCD} YS \ x1bG \ XB2 \ XE7 \ xdfV \ XC4 | \的x87 \ x1exhRk \ X81 \ XB8 \ X08 \ X10 \ x90E \ XFA \ X89 \ xb2f \ x9f \ XE3 \ xd9 \ XCF \ x9e \ x1dyb7 \ XEC \ tD型\ x03h \ X90 \ x90p \吨\ X07)YF" d \ xf9I&安培; EI \ XD4 }#39 \ X91 \ XB6。\ XEB \ xb0e \ X93 \ X94%Y \命苦$ E \ xccW \ x194 \ X00 \ XFE \ xe5 \ XAF \ XF0〜Y \ XD5 \ XD5 \ xa8 \ xeeF \ X83 \ XA7& ;

2 个答案:

答案 0 :(得分:1)

导入请求 来自bs4 import BeautifulSoup

def data():     url ='http://www.bilibili.com/video/av1669338'     user_agent =“Mozilla / 5.0(Windows NT 10.0; Win64; x64)AppleWebKit / 537.36(KHTML,与Gecko一样)Chrome / 51.0.2704.103 Safari / 537.36”     headers = {“User-Agent”:user_agent}     response = requests.get(url,headers = headers)

data = response.content
_html = BeautifulSoup(data)
_meta = _html.head.select('meta[name=keywords]')
print _meta[0]['content']

答案 1 :(得分:0)

试试这个:

import bs4, requests
res = requests.get("http://www.bilibili.com/video/av1669338")
soup = bs4.BeautifulSoup(res.content, "lxml")
result = soup.find("meta", attrs = {"name":"keywords"}).get("content")
print result