我正在尝试从this site刮下类Switch ($office) {
"Raleigh" {$sharelist = $RaleighShares}
"Austin" {$sharelist = $AustinShares}
# You can add additional office and their shares here as required
}
ForEach ($share in $sharelist) {
New-PSDrive -Name (Get-NextAvailableDrive) -Root $share -Persist -PSProvider "FileSystem"
}
,但似乎丢失了。我已经尝试了本文(different parsers)中链接到的Missing parts on Beautiful Soup results,但都没有成功。
这是我的代码:
div id="ideas_body"
和我尝试过的不成功的解析器:
import requests
from bs4 import BeautifulSoup
import lxml
# Set Soup
url = 'https://www.com/ideas#'
headers = {'User-Agent': 'Mozilla/5.0'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'lxml-xml')
soup = BeautifulSoup(page.content, 'html.parser')
soup = BeautifulSoup(page.content, 'html.parser-xml')
那么我该如何解析此ID才能将其抓取?
答案 0 :(得分:1)
正如评论中前面提到的,无需刮擦。您只需调用API即可获取所需的数据。
如果您需要30多个结果,请在form_data中更改“每页”。
import requests
form_data = {'type': 'idea',
'show': 'all',
'sort': 'new',
'per_page': 30,
'gotodate': '04/06/2019',
'ls': 'all',
'loc': 'all',
'marketcap_l': 0,
'shorten_name': 1
}
response = requests.post('https://www.valueinvestorsclub.com/messages/loadmsgs', data=form_data)
ideas = response.json()['result']
希望有帮助!