如何使用漂亮的汤从列表中存储的链接中抓取内容?

时间:2018-07-22 12:04:46

标签: python web-scraping beautifulsoup python-requests

我想抓取一个主要网站中所有帖子的标题。 main是一个列表,其中包含6或7个网址:

import requests 
from bs4 import Beautifulsoup 

r=requests.get("https://forums.oneplus.com/")
s=BeautifulSoup(r.content)
links=s.find_all("a",{"class" : "focus-content"})
url2=[] 
for link in links:
    url2.append(link.get("href"))

url1="https://forums.oneplus.com/"
for u in url2:
    main=url1+u
    print(main)

for m in main:
    r1=requests.get(m)
    s1=BeautifulSoup(r1)
    title=s1.find("span", {"class" : "title"})
    print(title)

1 个答案:

答案 0 :(得分:0)

您需要将变量main声明为列表。在代码中,您需要在循环的每次迭代中更新变量main。最后,main是包含url2列表中最后一个URL的字符串。如果随后将main提供给下一个循环,它将遍历各个字符。

在进行了一些外观更改之后,您应该会获得标题:

import requests
from bs4 import BeautifulSoup

r=requests.get("https://forums.oneplus.com/")
s=BeautifulSoup(r.content, 'lxml')
links=s.find_all("a",{"class" : "focus-content"})
url2=[]
for link in links:
    url2.append(link.get("href"))

url1 = "https://forums.oneplus.com/"
main = []
for u in url2:
    main.append(url1 + u)

for m in main:
    r1 = requests.get(m)
    s1 = BeautifulSoup(r1.text, 'lxml')
    title=s1.find("span", {"class" : "title"})
    print(title.text.strip())

打印:

Weekly 240: We release the updates and get reading
Shot on OnePlus: Part 6 – Best Slo-mo Video / Animal Photos
Android P Beta Developer Preview 3 for OnePlus 6
[Let's Talk] To whom are you gonna give your appreciation in this Community?
[Let's Talk] What Does Your OnePlus Device Replace?
[Let's Talk]  Loyalty to tech companies