Question

我正在尝试从网站上抓取一个表，一切都很好，运行没有错误，但是当我在csv中打开它时，我看到有多个web-scraping：text + table，当我只需要我正在网上抓一张桌子。

该表从53.行开始，我不明白。为什么我的代码也是web-scraping文本而不仅仅是表格？

我的代码：

from bs4 import BeautifulSoup
from selenium import webdriver
import time
import unicodecsv as csv

filename = r'output.csv'

resultcsv = open(filename, "wb")
output = csv.writer(resultcsv, delimiter=';', quotechar='"',
                    quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')

profile = webdriver.FirefoxProfile()
profile.set_preference("intl.accept_languages", "en-us")
driver = webdriver.Firefox(firefox_profile=profile)
driver.get("https://www.flightradar24.com/data/airports/bud/arrivals")
time.sleep(10)
html_source = driver.page_source
soup = BeautifulSoup(html_source, "html.parser")
print(soup)

table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })

datatable = []
for record in table.find_all('tr'):
    temp_data = []
    for data in record.find_all('td'):
        temp_data.append(data.text.encode('latin-1'))
    datatable.append(temp_data)

output.writerows(datatable)

resultcsv.close()

Answer 1

您的表中包含tr个标记中的所有这些行，这就是为什么它们会附加您想要的行。

你需要过滤你期望的标签类，在你的情况下这应该有效：

for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
    temp_data = []
    for data in record.find_all("td"):
        temp_data.append(data.text.encode('latin-1'))
    datatable.append(temp_data)

当我尝试用Python抓取表格时，为什么要将文本相乘？

1 个答案: