用请求和BeautifulSoup解析leetcode问题内容

时间:2019-06-15 07:51:29

标签: python web-scraping beautifulsoup python-requests

我正在尝试在Leetcode上解析面试问题的内容。

例如,在https://leetcode.com/problems/two-sum/上,

我正在尝试

Given an array of integers, return indices of the two numbers such that they add up to a specific target.

You may assume that each input would have exactly one solution, and you may not use the same element twice.

似乎并不难。我使用请求和BeautifulSoup来做到这一点:

    url = 'https://leetcode.com/graphql/two-sum'
    try:
        page = requests.get(url)
    except (requests.exceptions.ReadTimeout,requests.exceptions.ConnectTimeout):
        print('time out')
        return 'time out'

    soup = BeautifulSoup(page.content, 'html.parser')
    print(soup.prettify())

但是,正如您在通过开发者控制台(F12)在页面上的页面响应中看到的那样,响应不包括页面上显示的内容。

是否可以获取此内容?

3 个答案:

答案 0 :(得分:2)

您不需要硒。该页面对动态内容执行POST请求。基本上,将MySql查询发送到后端数据库。因此,执行以下操作要快得多:

     if (Platform.iOS)
 {//your code} 
else if (Platform.Android)
{//your other code}

答案 1 :(得分:1)

您需要在页面中加载Java脚本,然后获取页面内容。最简单的方法是使用硒。

from selenium import webdriver
from time import sleep
import os


# initialise browser
browser = webdriver.Chrome(os.getcwd() + '/chromedriver')
# load page
browser.get('https://leetcode.com/problems/two-sum/')

# execute java script
browser.execute_script("return document.getElementsByTagName('html')[0].innerHTML")

# wait page to load
sleep(5)

# get selected content
problem_description = browser.find_element_by_class_name('question-content__JfgR')
print(problem_description.text)

输出:

Given an array of integers, return indices of the two numbers such that they add up to a specific target.
You may assume that each input would have exactly one solution, and you may not use the same element twice.
Example:
Given nums = [2, 7, 11, 15], target = 9,

Because nums[0] + nums[1] = 2 + 7 = 9,
return [0, 1].

答案 2 :(得分:0)

通过执行动态Javascript生成网站。因此,您不会仅仅使用requests来获得它。您可以使用selenium模拟Firefox浏览器。

检查tutorial