我正在尝试使用Beautifulsoup和python2.7来抓取一个网页
请求没问题,但解析不完整。 无论真正的桌子长度如何,它似乎都会停在1668个细胞周围。
以下是代码:
import os, time, string, operator, requests
from bs4 import BeautifulSoup
url='http://fse.vdkruijssen.eu/ferrylist.php'
params ={'selectplane':'Cessna 208 Caravan','submit':''}
response=requests.post(url, data=params)
soup = BeautifulSoup(response.text, "lxml")
table = soup.find(id="ferryplane")
for tr in table.find_all('tr', class_=True): # filter the row that without text
row = [cell.text for cell in tr.find_all('td')]
print(row)
如何检索所有细胞?
我对网页抓取很新,任何帮助都会非常感激
谢谢!
编辑:显然代码没有问题。如图所示,我仍然得到截断的响应(最后一行)。如果您对导致这种情况的原因有所了解,请告诉我!
答案 0 :(得分:0)
import os, time, string, operator, requests
from bs4 import BeautifulSoup
url='http://fse.vdkruijssen.eu/ferrylist.php'
params ={'selectplane':'Cessna 208 Caravan','submit':''}
response=requests.post(url, data=params)
soup = BeautifulSoup(response.text, "lxml")
table = soup.find(id="ferryplane")
for tr in table.find_all('tr', class_=True): # filter the row that without text
row = [cell.text for cell in tr.find_all('td')]
print(row)
出:
['HB-TCK', 'Badenflug (carbonex)', 'LSZS', 'LSMU', '67', '1000', '670', '348', '419']
['RPC-3255', 'Bank of FSE', 'WAMR', 'RPLV', '910', '110', '1001', '-3374', '-2405']
['I-FGTY', 'Bank of FSE', 'LGEL', 'LIBN', '284', '110', '312', '-1428', '-925']
['ZT-YMC', 'Bank of FSE', 'FLEB', 'FAUT', '1230', '110', '1353', '-4560', '-3251']
['CS-PRB', 'PRA Rentals (Matt74)', 'LZKZ', 'EDDK', '561', '175', '982', '-1908', '-1180']
['ZU-YTU', 'Bank of FSE', 'FABE', 'FAJS', '409', '110', '450', '-2008', '-1300']
['ZS-FXN', 'cckohrs', 'FYML', 'FALA', '548', '200', '1096', '-2668', '-1377']
['HL-7227', 'Bank of FSE', 'RJOB', 'RKSO', '360', '110', '396', '-1483', '-971']