如何使用Selenium Python chromeDriver向下滚动到动态页面的末尾

时间:2020-04-13 15:44:40

标签: javascript python web-scraping beautifulsoup

请帮助我。 我正在尝试向下滚动到动态页面的末尾并获取HTML代码,但是它无法正常工作。 我尝试了this。 这只会向下滚动一次。 我将睡眠时间从2更改为超过5,这只会向下滚动两次,然后从while循环中断。 此页面为here

我们非常感谢您的帮助。

2 个答案:

答案 0 :(得分:0)

不需要硒。数据在json结构的源html中呈现。

import pandas as pd
import requests
from bs4 import BeautifulSoup
import json
from pandas.io.json import json_normalize




url  = 'https://www.tgmotorsales.com/pre-owned-cars?results='
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

jsonStr = soup.find('div',{'id':'ds-vehicles-json'})['data-json']
jsonData = json.loads(jsonStr)
df = json_normalize(jsonData)

输出:

print (df.head().to_string())
         AccidentIndicatorsText  Age BasicExteriorColor BasicExteriorColorSwatch BodyType BodyTypeName  CabType CabTypeName  CarHighestCost  CarLowestCost                                CarfaxFeedText  CarfaxOk     CarfaxText CategoryId CategoryIdList CategoryName CategoryType  CertifiedStatus ChromeCode                                      ComingSoonUrl                                  ComingSoonUrlBase Comments  Condition  CreatedBy            CreatedOn  CreatorUserType  Cylinders DamageType DamageTypeName DataOptionSelected DealerCost  DealerId  DealerLocationId  DestinationPrice  DisabilityEquipped  Discount  DiscountPrice  DiscountType  DiscountValue  Doors        DriveTrain  EndRow             Engine  FDiff FactoryColor         FactoryColorText FactoryInterior FactoryInteriorText FileName  FinalPrice                                      FirstImageUrl           Fuel FuelCapacity FuelEconomyCity FuelEconomyHighway FuelName GrossVehicleWeightRating  HasHighlightedFeatures         HorsePower  ImageCount ImageUrls Images          InStockDate  InternetPrice  InvoicePrice InvoiceValue  IsActive  IsAutoTrader  IsBasicColor  IsCarsCom  IsCudl  IsCustomerSaved  IsDeleted  IsEdmunds  IsHighlighted  IsInboundLocked  IsInspected  IsNew  IsNewPrice  IsOnSale  IsOptional IsPakageInstalled  IsPriceLocked  IsPriority  IsPromotionLock  IsPublishedQualityControlInspectionReport  IsPublishedVehicleBuyersGuid  IsRecycler  IsRemoved  IsSelected  IsShowMsrpInvoice  IsSmogChecked  IsSold  IsStockManual  IsUniversalPromotionText  IsUpdateNow LastModifiedBy LicensePlate LicensePlateState  MDiff  Make  MakeId MakeName MakeOther  Mileage  MoDiff     Model ModelId  ModifiedBy           ModifiedOn  ModifierUserType  MsrpPrice NewPrice          NonWaterMarkedImageUrl Notes OldInternetPrice OptionCodes OptionDescription Options OwnershipText PackageValue Pakage PakageSelected        PriceLockDate PriceLockNotes         PromoExpires          PromoStarts  PromotionCode PromotionDescription PromotionText  Rank            RemovedOn RetailValue        ScheduledDate SearchText  SequenceNumber SortColumn SortDirection  Source  StartRow StateCode Status           StatusDate StockNo StockNumberFirst StockNumberSettingType StockNumberStartsWith  StockSettingId                                Style  StyleId SubCategoryName  Title TitleState TitleStatus Torque Transmission  TransmissionType          Trim TruckBedLength TruckBedWidth  UpdateGroupId  VehicleId                    VehicleImage                          VehicleImageHash  VehicleImageId VehicleInventorySource VehicleInventorySourceScript VehicleInventoryUpdateSource VehicleInventoryUpdateSourceScript                    VehicleName VehicleTitle  VideoCount                Vin  Warranty  YDiff  Year
0          1 Accidents Reported    0               None                     None        D         None        0        None               0              0                                 Carfax Report      True  Carfax Report       None           None         None         None                0       None  //images.dealersync.com/cloud/userdocumentprod...  /2714/Photos/comingsoon/450e3bd07f284a09b7225e...     None          2          0  0001-01-01T00:00:00                0          6       None                              None       None      2714               756               0.0                   0       0.0            0.0         False            0.0      0   All Wheel Drive       0  3.0L  6 Cylinders      0      #000000          Brilliant Black         #000000               Black     None     13000.0  //images.dealersync.com/cloud/userdocumentprod...  Gasoline Fuel         None              19                 28     None                     None                   False  310 hp @ 5500 rpm          35      None     []  2020-02-24T00:00:00        13000.0           0.0         None     False         False         False      False   False            False      False      False          False            False        False  False       False     False       False              None          False       False            False                                      False                         False       False      False       False              False          False    True          False                     False        False           None         None              None      0  Audi       0     None      None    90060       0        A6    None           0  0001-01-01T00:00:00                 0    49900.0     None  20200225004903504_IMG_4695.jpg  None             None        None              None      []      2 Owners         None   None           None  0001-01-01T00:00:00           None  0001-01-01T00:00:00  0001-01-01T00:00:00              0                 None          None     0  0001-01-01T00:00:00        None  0001-01-01T00:00:00       None               0       None          None       0         0      None      S  0001-01-01T00:00:00    1347             None                   None                  None               0         4dr Sdn quattro 3.0T Premium        0            None      1       None       Clear   None    Automatic                 1  3.0T Premium           None          None              0     496489  20200225004903504_IMG_4695.jpg  919e844a50786ba437eea8ca98fb1b15c02f9d62               0                   None                         None                         None                               None      2012 Audi A6 3.0T Premium         None           0  WAUBGAFCXCN003858         0      0  2012
1  No Accidents/Damage Reported    0               None                     None        C         None        0        None               0              0  Carfax Report - No Accidents/Damage Reported      True  Carfax Report       None           None         None         None                0       None  //images.dealersync.com/cloud/userdocumentprod...  /2714/Photos/comingsoon/450e3bd07f284a09b7225e...     None          2          0  0001-01-01T00:00:00                0          4       None                              None       None      2714               756               0.0                   0       0.0            0.0         False            0.0      0   All Wheel Drive       0  1.8L  4 Cylinders      0      #CDCDCD     Lake Silver Metallic                               Ebony     None      7950.0  //images.dealersync.com/cloud/userdocumentprod...  Gasoline Fuel         None              20                 28     None                     None                   False  225 hp @ 5900 rpm          32      None     []  2019-12-18T00:00:00         7950.0           0.0         None     False         False         False      False   False            False      False      False          False            False        False  False       False     False       False              None          False       False            False                                      False                         False       False      False       False              False          False    True          False                     False        False           None         None              None      0  Audi       0     None      None    80000       0        TT    None           0  0001-01-01T00:00:00                 0    39600.0     None  20200102234214276_IMG_3807.jpg  None             None        None              None      []      3 Owners         None   None           None  0001-01-01T00:00:00           None  0001-01-01T00:00:00  0001-01-01T00:00:00              0                 None          None     0  0001-01-01T00:00:00        None  0001-01-01T00:00:00       None               0       None          None       0         0      None      S  0001-01-01T00:00:00    1311             None                   None                  None               0                2dr Cpe quattro 6-Spd        0            None      1       None       Clear   None       Manual                 2                         None          None              0     470937  20200102234214276_IMG_3807.jpg  5710e20e7ff2a325b0bbdd65b6155058d583fab7               0                   None                         None                         None                               None                  2002 Audi TT          None           0  TRUWT28N221000808         0      0  2002
2  No Accidents/Damage Reported    0               None                     None        D         None        0        None               0              0  Carfax Report - No Accidents/Damage Reported      True  Carfax Report       None           None         None         None                0       None  //images.dealersync.com/cloud/userdocumentprod...  /2714/Photos/comingsoon/450e3bd07f284a09b7225e...     None          2          0  0001-01-01T00:00:00                0          6       None                              None       None      2714               756               0.0                   0       0.0            0.0         False            0.0      0   All Wheel Drive       0  3.0L  6 Cylinders      0      #000000  Black Sapphire Metallic                               Beige     None      8950.0  //images.dealersync.com/cloud/userdocumentprod...  Gasoline Fuel         None              17                 25     None                     None                   False  230 hp @ 6500 rpm          47      None     []  2020-02-24T00:00:00         8950.0           0.0         None     False         False         False      False   False            False      False      False          False            False        False  False       False     False       False              None          False       False            False                                      False                         False       False      False       False              False          False   False          False                     False        False           None         None              None      0   BMW       0     None      None   107223       0  3 Series    None           0  0001-01-01T00:00:00                 0    36600.0     None  20200326000458548_IMG_4870.jpg  None             None        None              None      []      5 Owners         None   None           None  0001-01-01T00:00:00           None  0001-01-01T00:00:00  0001-01-01T00:00:00              0                 None          None     0  0001-01-01T00:00:00        None  0001-01-01T00:00:00       None               0       None          None       0         0      None      I  0001-01-01T00:00:00    1345             None                   None                  None               0        4dr Sdn 328i xDrive AWD SULEV        0            None      1       None       Clear   None    Automatic                 1   328i xDrive           None          None              0     502702  20200326000458548_IMG_4870.jpg  6fb61e8026cc85b3f2e312de4b1a3e3140e7d56c               0                   None                         None                         None                               None  2011 BMW 3 Series 328i xDrive         None           0  WBAPK5C50BF124652         0      0  2011
3  No Accidents/Damage Reported    0               None                     None        D         None        0        None               0              0  Carfax Report - No Accidents/Damage Reported      True  Carfax Report       None           None         None         None                0       None  //images.dealersync.com/cloud/userdocumentprod...  /2714/Photos/comingsoon/450e3bd07f284a09b7225e...     None          2          0  0001-01-01T00:00:00                0          6       None                              None       None      2714               756               0.0                   0       0.0            0.0         False            0.0      0  Rear Wheel Drive       0  3.0L  6 Cylinders      0      #010101  Black Sapphire Metallic                               Beige     None      6950.0  //images.dealersync.com/cloud/userdocumentprod...  Gasoline Fuel         None              18                 28     None                     None                   False  230 hp @ 6500 rpm          37      None     []  2020-01-02T00:00:00         6950.0           0.0         None     False         False         False      False   False            False      False      False          False            False        False  False       False     False       False              None          False       False            False                                      False                         False       False      False       False              False          False    True          False                     False        False           None         None              None      0   BMW       0     None      None   104650       0  3 Series    None           0  0001-01-01T00:00:00                 0    42595.0     None  20200108013341357_IMG_4019.jpg  None             None        None              None      []      3 Owners         None   None           None  0001-01-01T00:00:00           None  0001-01-01T00:00:00  0001-01-01T00:00:00              0                 None          None     0  0001-01-01T00:00:00        None  0001-01-01T00:00:00       None               0       None          None       0         0      None      S  0001-01-01T00:00:00    1314             None                   None                  None               0  4dr Sdn 328i RWD SULEV South Africa        0            None      1       None       Clear   None    Automatic                 1          328i           None          None              0     473172  20200108013341357_IMG_4019.jpg  378ef180ba92675b9d573dc232a516152a2b7d59               0                   None                         None                         None                               None         2010 BMW 3 Series 328i         None           0  WBAPH5G52ANM36680         0      0  2010
4  No Accidents/Damage Reported    0               None                     None        C         None        0        None               0              0  Carfax Report - No Accidents/Damage Reported      True  Carfax Report       None           None         None         None                0       None  //images.dealersync.com/cloud/userdocumentprod...  /2714/Photos/comingsoon/450e3bd07f284a09b7225e...     None          2          0  0001-01-01T00:00:00                0          4       None                              None       None      2714               756               0.0                   0       0.0            0.0         False            0.0      0   All Wheel Drive       0  2.0L  4 Cylinders      0      #E9EEE8             Alpine White         #000000               Black     None     16500.0  //images.dealersync.com/cloud/userdocumentprod...  Gasoline Fuel         None              22                 33     None                     None                   False  240 hp @ 5000 rpm          41      None     []  2020-03-10T00:00:00        16500.0           0.0         None     False         False         False      False   False            False      False      False          False            False        False  False       False     False       False              None          False       False            False                                      False                         False       False      False       False              False          False   False          False                     False        False           None         None              None      0   BMW       0     None      None    82460       0  4 Series    None           0  0001-01-01T00:00:00                 0    57650.0     None  20200311010048004_IMG_4926.jpg  None             None        None              None      []      2 Owners         None   None           None  0001-01-01T00:00:00           None  0001-01-01T00:00:00  0001-01-01T00:00:00              0                 None          None     0  0001-01-01T00:00:00        None  0001-01-01T00:00:00       None               0       None          None       0         0      None      I  0001-01-01T00:00:00    1356             None                   None                  None               0        2dr Cpe 428i xDrive AWD SULEV        0            None      1       None       Clear   None    Automatic                 1   428i xDrive           None          None              0     504580  20200311010048004_IMG_4926.jpg  63a2ff0985a8e2276d33ab7dca1ae40221e528a5               0                   None                         None                         None                               None  2014 BMW 4 Series 428i xDrive         None           0  WBA3N9C56EK245257         0      0  2014
....

答案 1 :(得分:0)

您可以直接通过以下方式致电API

import csv
import json
import requests

params = {
    "BodyType": "",
    "Year": "",
    "Make": "",
    "Model": "",
    "PriceRange": "",
    "PriceStart": "",
    "PriceEnd": "",
    "Condition": "pre-owned-cars",
    "Color": "",
    "InteriorColor": "",
    "CityMpg": "",
    "HighwayMpg": "",
    "Transmission": "",
    "DriveTrain": "",
    "Fuel": "",
    "SearchExpression": "",
    "SortCriteria": "",
    "SortDirection": "",
    "LocationId": "",
    "IsCertified": "-1",
    "IsSold": "",
    "IsFuzzySearch": "false",
    "startIndex": "1",
    "Results": "60"
}

names = ["VehicleName", "Engine", "Transmission",
         "FuelEconomyCity", "FuelEconomyHighway", "StockNo", "Vin", "IsSold", "Mileage"]


def main(url):
    r = requests.get(url, params=params).json()
    with open("data.csv", 'w', newline="") as f:
        writer = csv.writer(f)
        writer.writerow(names)
        for item in r['vehicles']:
            writer.writerow([item[name] for name in names])


main("https://www.tgmotorsales.com/Inventory/Search")

输出:view-online

enter image description here

短熊猫版:

def main(url):
    r = requests.get(url, params=params).json()
    df = pd.DataFrame(r['vehicles'])
    df.to_csv("data.csv", index=False)


main("https://www.tgmotorsales.com/Inventory/Search")