初学者Python网络抓取问题

时间:2020-08-05 10:12:03

标签: python web-scraping beautifulsoup

我是网络爬虫的新手,非常感谢您的帮助! 我想进行搜索并返回其结果,但它返回运行时错误。 我当前的代码如下:

from googlesearch import search
import requests
from bs4 import BeautifulSoup

print('Please enter your first name')
firstName = input()
print('Please enter your surname')
secondName = input()
query = firstName + ' ' + secondName
print('Please enter language ex:[en,fr,ar,jp,cn...]: ')
lang = input()

# requests
url = 'https://www.google.com/search?hl={}&q;={}&start;=3i#=10&ie;=UTF-8'.format(lang, query)
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'}
# url source
source = requests.get(url, headers=headers).text

# BeautifulSoup
soup = BeautifulSoup(source, 'lxml')
# find all divs that contain search result
search_div = soup.find_all(class_='rc')
for result in search_div:
    # loop result list
    #geting h3
    print('Title: %s'%result.h3.string)
    print('\n')
    #geting a.href
    print('Url: %s'%result.a.get('href'))
    print('\n')
    # description
    print('Description: %s'%result.find(class_='st').text)
    print('\n###############\n')

但我收到此错误:

Traceback (most recent call last):
  File "/Users/axy/PycharmProjects/Name_Search/main.py", line 20, in <module>
    soup = BeautifulSoup(source, 'lxml')
  File "/Users/axy/PycharmProjects/Name_Search/venv/lib/python3.8/site-packages/bs4/__init__.py", line 242, in __init__
    raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

Process finished with exit code 1

我是一个初学者,非常感谢您提供一些指导。 我也希望有人可以解释这行的含义:

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'} 

谢谢!

2 个答案:

答案 0 :(得分:0)

好像您需要安装lxml。

只需pip安装lxml,它就可以正常工作

答案 1 :(得分:0)

在此行soup = BeautifulSoup(source, 'lxml')中,您正在使用lxml解析器而不安装它。

运行pip install lxml,它将正常工作。