Beautifulsoup缺少ID

时间:2019-07-05 19:00:42

标签: python xml web-scraping beautifulsoup

我正在尝试从this site刮下类Switch ($office) { "Raleigh" {$sharelist = $RaleighShares} "Austin" {$sharelist = $AustinShares} # You can add additional office and their shares here as required } ForEach ($share in $sharelist) { New-PSDrive -Name (Get-NextAvailableDrive) -Root $share -Persist -PSProvider "FileSystem" } ,但似乎丢失了。我已经尝试了本文(different parsers)中链接到的Missing parts on Beautiful Soup results,但都没有成功。

这是我的代码:

div id="ideas_body"

和我尝试过的不成功的解析器:

  1. import requests from bs4 import BeautifulSoup import lxml # Set Soup url = 'https://www.com/ideas#' headers = {'User-Agent': 'Mozilla/5.0'} page = requests.get(url, headers=headers)
  2. soup = BeautifulSoup(page.content, 'lxml-xml')
  3. soup = BeautifulSoup(page.content, 'html.parser')
  4. soup = BeautifulSoup(page.content, 'html.parser-xml')

那么我该如何解析此ID才能将其抓取?

1 个答案:

答案 0 :(得分:1)

正如评论中前面提到的,无需刮擦。您只需调用API即可获取所需的数据。

如果您需要30多个结果,请在form_data中更改“每页”。

import requests


form_data = {'type': 'idea',
             'show': 'all',
             'sort': 'new',
             'per_page': 30,
             'gotodate': '04/06/2019',
             'ls': 'all',
             'loc': 'all',
             'marketcap_l': 0,
             'shorten_name': 1
             }

response = requests.post('https://www.valueinvestorsclub.com/messages/loadmsgs', data=form_data)

ideas = response.json()['result']

希望有帮助!