如何使用Python提取Google反馈框的内容?

时间:2019-09-22 13:31:41

标签: python-3.x ubuntu web-scraping beautifulsoup

我一直试图从这样的终端在Ubuntu 16.04上运行代码

python3 myscript.py

。我想使用任何查询,作为回报,我只想从反馈框中获取文本。整个代码如下。

#! /usr/bin/env python3.5
# myscript.py

import urllib
from bs4 import BeautifulSoup
import requests
import webbrowser
import sys
import html
import codecs

searchterm = 'What animal is the mascot for Linux'.join(sys.argv[1:])
searchterm = urllib.parse.quote_plus(searchterm)
url = 'https://www.google.com/search?q=define+' + searchterm
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'
 }
 res = requests.get(url, headers=headers)
try:
    res.raise_for_status()
except Exception as exc:
    print('error while loading page occured: ' + str(exc))

text = html.unescape(res.text)
soup = BeautifulSoup(text, 'lxml')
prettytext = soup.prettify()

#next lines are for analysis (saving raw page), you can comment them
frawpage = codecs.open('rawpage.txt', 'w', 'utf-8')
frawpage.write(prettytext)
frawpage.close()

firsttag = soup.find('h3', class_="r")
if firsttag != None:
    print(firsttag.getText())
    print()

#second tag may be changed, so check it if not returns correct result. That might be 
situation for all searched tags.
secondtag = soup.find('div', {'style': 'color:#666;padding:5px 0'})
if secondtag != None:
    print(secondtag.getText())
    print()

termtags = soup.findAll("li", {"style" : "list-style-type:decimal"})

count = 0
for tag in termtags:
    count += 1
    print( str(count)+'. ' + tag.getText())
    print()

它执行但不打印任何内容。在此过程中发生了什么?不了解,感到困惑。任何帮助将不胜感激。谢谢你们。

0 个答案:

没有答案