Python:使用用户输入进行网页抓取

时间:2021-01-04 01:44:40

标签: python web web-scraping

from bs4 import BeautifulSoup
from urllib.request import urlopen as uReq
import requests

url = 'https://en.wikisource.org/wiki/Main_Page'
r = requests.get(url)

Soup = BeautifulSoup(r.text, "html5lib")
List = Soup.find("div",class_="enws-mainpage-widget-content", id="enws-mainpage-newtexts-content").find_all('a')
ebooks=[]
i=0
for ebook in List:
    x=ebook.get('title')
    for ch in x:
        if(ch==":"):
            x=""
    if x!="":
        ebooks.append(x)
        i=i+1
        
print("Please select a book: ")
inputnumber=0
while inputnumber<len(ebooks):
    print(inputnumber+1, " - ", ebooks[inputnumber])
    inputnumber=inputnumber+1
input=int(input())
selectedbook = Soup.find("href", title=ebooks[input-1])
print(selectedbook)

我想获得用户选择的href,但作为输出我得到:无

谁能告诉我我哪里做错了

1 个答案:

答案 0 :(得分:0)

我更改了您代码的最后两行,并添加了这些

selectedbook = Soup.find("a", title=ebooks[input-1])
print(selectedbook['title'])
print("https://en.wikisource.org/"+selectedbook['href'])

这行得通!

注意:find() 方法搜索具有指定名称的第一个标签并返回一个 bs4.element.Tag 类型的对象。