我正在开展网络抓取项目。
我遇到了一个问题,我运行一个for循环来遍历一个列表,但它把它作为一个回来。
我的目标是将列表中的每个项目分开并将其保存为在数据框中显示它的变量,但是,我面对的是一个文本块。
我该怎么做呢
import requests
from bs4 import BeautifulSoup
import pandas
page_link = ("https://www.autotrader.co.uk/car-search?sort=price-asc&"
"radius=1500&postcode=lu15jf&onesearchad=Used&"
"onesearchad=Nearly%20New&onesearchad=New&make=AUDI&model=A5"
"&price-to=8500&year-from=2008&maximum-mileage=90000"
"&transmission=Automatic&exclude-writeoff-categories=on")
request = requests.get(page_link)
conn = request.content
soup = BeautifulSoup(conn, "html.parser")
cars = soup.find_all("h2", {"class":"listing-title title-wrap"})
cars_specs = soup.find_all('ul', {"class" :'listing-key-specs '})
carlist = []
for car, specs in zip(cars, cars_specs):
dic = {}
dic["Car Model"]=car.find("a", {"class": "js-click-handler listing-fpa-link"}).text
dic["Specs"] = specs.text
carlist.append(dic)
df = pandas.DataFrame(carlist)
df
答案 0 :(得分:1)
我认为做什么工作
df1 = df.Specs.str.split(pat='\n', expand=True)
df1 = df1.replace('',np.nan)
df1 = df1.dropna(axis=1, how='all')
df1.columns = ['Spec_' + str(x) for x in list(df1)]
df1
Spec_1 Spec_2 Spec_3 Spec_4 Spec_5 Spec_6 Spec_7
0 2008 (08 reg) Coupe 77,500 miles 2.7L 187bhp Automatic Diesel
1 2008 (58 reg) Coupe 69,170 miles 2.7L 187bhp Automatic Diesel
2 2008 (58 reg) Coupe 84,700 miles 2.7L 187bhp Automatic Diesel
3 2008 (58 reg) Coupe 53,800 miles 2.7L 187bhp Automatic Diesel
4 2009 (09 reg) Coupe 85,000 miles 2.7L 187bhp Automatic Diesel
5 2008 (08 reg) Coupe 74,000 miles 3.2L 261bhp Automatic Petrol
6 2008 (08 reg) Coupe 67,000 miles 3.2L 261bhp Automatic Petrol
7 2008 Coupe 90,000 miles 2.7L 187bhp Automatic Diesel
8 2008 (58 reg) Coupe 59,277 miles 2.7L 187bhp Automatic Diesel
9 2009 (09 reg) Coupe 78,412 miles 2.7L 187bhp Automatic Diesel