Question

我正在尝试使用Beautifulsoup和python2.7来抓取一个网页

请求没问题，但解析不完整。无论真正的桌子长度如何，它似乎都会停在1668个细胞周围。

以下是代码：

import os, time, string, operator, requests
from bs4 import BeautifulSoup

url='http://fse.vdkruijssen.eu/ferrylist.php'

params ={'selectplane':'Cessna 208 Caravan','submit':''}
response=requests.post(url, data=params)

soup = BeautifulSoup(response.text, "lxml")
table = soup.find(id="ferryplane")
for tr in table.find_all('tr', class_=True):  # filter the row that without text
    row = [cell.text for cell in tr.find_all('td')]
    print(row)

如何检索所有细胞？

我对网页抓取很新，任何帮助都会非常感激

谢谢！

编辑：显然代码没有问题。如图所示，我仍然得到截断的响应（最后一行）。如果您对导致这种情况的原因有所了解，请告诉我！

Answer 1

import os, time, string, operator, requests
from bs4 import BeautifulSoup

url='http://fse.vdkruijssen.eu/ferrylist.php'

params ={'selectplane':'Cessna 208 Caravan','submit':''}
response=requests.post(url, data=params)

soup = BeautifulSoup(response.text, "lxml")
table = soup.find(id="ferryplane")
for tr in table.find_all('tr', class_=True):  # filter the row that without text
    row = [cell.text for cell in tr.find_all('td')]
    print(row)

出：

['HB-TCK', 'Badenflug (carbonex)', 'LSZS', 'LSMU', '67', '1000', '670', '348', '419']
['RPC-3255', 'Bank of FSE', 'WAMR', 'RPLV', '910', '110', '1001', '-3374', '-2405']
['I-FGTY', 'Bank of FSE', 'LGEL', 'LIBN', '284', '110', '312', '-1428', '-925']
['ZT-YMC', 'Bank of FSE', 'FLEB', 'FAUT', '1230', '110', '1353', '-4560', '-3251']
['CS-PRB', 'PRA Rentals (Matt74)', 'LZKZ', 'EDDK', '561', '175', '982', '-1908', '-1180']
['ZU-YTU', 'Bank of FSE', 'FABE', 'FAJS', '409', '110', '450', '-2008', '-1300']
['ZS-FXN', 'cckohrs', 'FYML', 'FALA', '548', '200', '1096', '-2668', '-1377']
['HL-7227', 'Bank of FSE', 'RJOB', 'RKSO', '360', '110', '396', '-1483', '-971']

我确定没有丢失行：

在Python

1 个答案: