Question

我尝试使用Python 2.7从网站上抓取网页，其中有一个表必须加载。如果我正在尝试网络抓取它，我只能得到：＆＃34;正在加载＆＃34;或者＆＃34;抱歉，我们没有任何关于它的信息＆＃34;因为它必须首先加载..

我阅读了一些文章和代码，但没有任何效果。

我的代码：

＆＃13;

import urllib2, sys
from BeautifulSoup import BeautifulSoup
import json

site= "https://www.flightradar24.com/data/airports/bud/arrivals"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
nev = soup.find('h1' , attrs={'class' : 'airport-name'})
print nev

table = soup.find('div', { "class" : "row cnt-schedule-table" })
print table

＆＃13;

import urllib2
from bs4 import BeautifulSoup
import json

# new url      
url = 'https://www.flightradar24.com/data/airports/bud/arrivals'

# read all data
page = urllib2.urlopen(url).read()

# convert json text to python dictionary
data = json.loads(page)

print(data['row cnt-schedule-table'])

＆＃13;

Answer 1

我也面临这个问题..你可以使用python selenium包。我们需要等待加载你的表，所以我使用time.sleep（）但这不是正确的方法。你可以使用wait.until（＆＃34; element＆＃34;）方法PFB示例代码登录

from bs4 import BeautifulSoup
from selenium import webdriver
import time
profile=webdriver.FirefoxProfile()
profile.set_preference("intl.accept_languages","en-us")
driver = webdriver.Firefox(firefox_profile=profile)
driver.get("https://www.flightradar24.com/data/airports/bud/arrivals")
time.sleep(10)
html_source=driver.page_source
soup=BeautifulSoup(html_source,"html.parser")
print soup

参考链接。

Selenium waitForElement

如何从有加载表的网站抓取网页？

1 个答案: