Question

我正在尝试从this site刮下类Switch ($office) { "Raleigh" {$sharelist = $RaleighShares} "Austin" {$sharelist = $AustinShares} # You can add additional office and their shares here as required } ForEach ($share in $sharelist) { New-PSDrive -Name (Get-NextAvailableDrive) -Root $share -Persist -PSProvider "FileSystem" }，但似乎丢失了。我已经尝试了本文（different parsers）中链接到的Missing parts on Beautiful Soup results，但都没有成功。

这是我的代码：

div id="ideas_body"

和我尝试过的不成功的解析器：

import requests from bs4 import BeautifulSoup import lxml # Set Soup url = 'https://www.com/ideas#' headers = {'User-Agent': 'Mozilla/5.0'} page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'lxml-xml')
soup = BeautifulSoup(page.content, 'html.parser')
soup = BeautifulSoup(page.content, 'html.parser-xml')

那么我该如何解析此ID才能将其抓取？

Answer 1

正如评论中前面提到的，无需刮擦。您只需调用API即可获取所需的数据。

如果您需要30多个结果，请在form_data中更改“每页”。

import requests


form_data = {'type': 'idea',
             'show': 'all',
             'sort': 'new',
             'per_page': 30,
             'gotodate': '04/06/2019',
             'ls': 'all',
             'loc': 'all',
             'marketcap_l': 0,
             'shorten_name': 1
             }

response = requests.post('https://www.valueinvestorsclub.com/messages/loadmsgs', data=form_data)

ideas = response.json()['result']

希望有帮助！

Beautifulsoup缺少ID

1 个答案: