我试图使用beautifulSoup进行一些网页抓取,
我希望能够使用element = ["currency_rate", "CU", "AU", "AG", "MO", "Smaeltloen", "Hg", "Laegesvaerde"]
colu = [27, 32, 33, 34, 35, 36, 37, 38] # column number
i = 0
while i < len(element) + 1:
h = "Payable_"+element[i]
vars()[h] = h = readexcel_column(start, end, colu[i])
print(h)
i = i+1
方法使用CSS :nth-child()
过滤器。
这个功能有没有实现?
有没有更好的方法来使用beautifulSoup提取特定元素?
.select()
我知道有# import dependencied
from bs4 import BeautifulSoup
import requests
import json
def getSoup(url):
# raw data
source_code = requests.get(url)
# convert to text
plain_text = source_code.text
# lxml format
soup = BeautifulSoup(plain_text, 'lxml')
return soup
# get data from site
baseUrl = "https://stackoverflow.com/questions/"
questionId = 48139550
# create our URL
url = baseUrl + postId
try:
page_soup = getSoup(url)
poi = page_soup.select('#sidebar > div.module.community-bulletin > div > div:nth-child(4) > div.bulletin-item-content > a')
print(poi)
except Exception as e:
print(e)
方法,但它不那么直观..
关于:nth-of-type()
的任何想法?