我正在尝试从网页获取数据。这是链接https://www.cardekho.com/compare-cars。在此页面上,一旦我们在下拉菜单中提供了汽车型号及其变体的URL,就需要抓取汽车数据表及其规格的比较。这是我的示例代码。
---
- name: test
hosts: localhost
tasks:
- name: Instalation of postgresql-9.6
apt:
name: postgresql-9.6
- name: start postgresql service
service: name=postgresql state=restarted enabled=yes
- name: create a database
postgresql_db:
name: managys
encoding: UTF-8
template: template0
state: present
become_user: postgres
become: yes
但是这里的问题是,由于URL的原因,我没有获得所需的确切数据。这意味着,如果我给出四种车型及其变型进行比较,它将从提到的下拉菜单中随机给出该车型的数据。
任何人都可以解释我如何解决此问题并从该URL获取所需的准确数据。
任何帮助将不胜感激。
答案 0 :(得分:0)
您正在做大量工作来解析这些表。熊猫可以使用.read_html()
为您完成这项工作。
这将向您返回数据帧列表。只需选择数据帧并使用熊猫.to_csv()
写入csv。
如果是我,我会将其压缩为一个循环以遍历它们,但是我将其扩展了,以便您可以看到它破裂了(如果有帮助的话)
import pandas as pd
url = 'https://www.cardekho.com/compare/maruti-gypsy-and-maruti-omni.htm'
tables = pd.read_html(url)
compare_cols = list(tables[0].columns[1:])
overview = tables[0]
engine = tables[1]
engine.columns = [engine.columns[0]] + compare_cols
transmision = tables[2]
transmision.columns = [transmision.columns[0]] + compare_cols
steering = tables[3]
steering.columns = [steering.columns[0]] + compare_cols
brakes_system = tables[4]
brakes_system.columns = [brakes_system.columns[0]] + compare_cols
overview.to_csv('D:/CarDekho_Data/maruti/maruti_2/overview.csv', index=False)
engine.to_csv('D:/CarDekho_Data/maruti/maruti_2/engine.csv', index=False)
transmision.to_csv('D:/CarDekho_Data/maruti/maruti_2/transmision.csv', index=False)
steering.to_csv('D:/CarDekho_Data/maruti/maruti_2/steering.csv', index=False)
brakes_system.to_csv('D:/CarDekho_Data/maruti/maruti_2/brakes_system.csv', index=False)
输出:
print (overview)
Overview ... Omni
0 On Road Price ... Rs.3,36,883*
1 Fuel Type ... Petrol
2 Engine Displacement (cc) ... 796
3 Available Colors ... Fantasy BlackMetallic silky silverMetallic Pea...
4 Body Type ... Minivan
5 Max Power ... 34.2bhp@5000rpm
6 User Reviews ... 4.5Based on 45 Reviews
7 Mileage (ARAI) ... 16.8 kmpl
8 Cargo Volume ... 210-litres
9 Fuel Tank Capacity ... 35Litres
10 Seating Capacity ... 5
11 Transmission Type ... Manual
12 Offers & Discount ... 1 OfferView now
13 Finance Available (EMI) ... Rs.6,510 Check Now
14 Insurance SaveBig ... Rs.17,146Know how
15 Service Cost ... Rs.2,996
16 NaN ... NaN
17 Air Conditioner ... No
18 Cd Player ... No
19 Anti Lock Braking System ... No
20 Power Steering ... No
21 Power Windows Front ... No
22 Power Windows Rear ... No
23 Leather Seats ... No
24 Speed Sensing Auto Door Lock ... No
25 Impact Sensing Auto Door Unlock ... -
26 Air Conditioner ... No
27 Heater ... No
28 Adjustable Steering ... No
29 Tachometer ... No
.. ... ... ...
47 Adjustable Headlights ... Yes
48 Fog Lights Front ... No
49 Fog Lights Rear ... No
50 Power Adjustable Exterior Rear View Mirror ... No
51 Manually Adjustable Ext Rear View Mirror ... Yes
52 Electric Folding Rear View Mirror ... No
53 Rain Sensing Wiper ... No
54 Rear Window Wiper ... No
55 Rear Window Washer ... No
56 Rear Window Defogger ... No
57 Wheel Covers ... No
58 Alloy Wheels ... No
59 Power Antenna ... No
60 Tinted Glass ... No
61 Rear Spoiler ... No
62 Removable Or Convertible Top ... No
63 Roof Carrier ... No
64 Sun Roof ... No
65 Moon Roof ... No
66 Side Stepper ... No
67 Outside Rear View Mirror Turn Indicators ... No
68 Integrated Antenna ... No
69 Chrome Grille ... No
70 Chrome Garnish ... No
71 Smoke Headlamps ... No
72 Roof Rail ... No
73 Lighting ... No
74 Trunk Opener ... Lever
75 Additional Features ... 2 Speed Windshield WiperFront And Rear Thermop...
76 Heated Wing Mirror ... No
[77 rows x 3 columns]
...
print (engine)
Engine ... Omni
0 Type ... In-Line Engine
1 Displacement ... 796
2 Max Power ... 34.2bhp@5000rpm
3 Year ... 2010
4 Max Torque ... 59Nm@2500rpm
5 Description ... 0.8-litre 34.2bhp 6V In-Line Engine
6 No Of Cylinder ... 3
7 Valves Per Cylinder ... 2
8 Valve Configuration ... SOHC
9 Fuel Supply System ... MPFI
10 Bore XStroke ... No
11 Compression Ratio ... No
12 Turbo Charger ... No
13 Super Charger ... No
[14 rows x 3 columns]
ETC。