我正在从zoopla.co.uk抓取房屋数据
数据框似乎可以正确打印,但是pandas仅将最后一个元素(最后一个房子)打印到csv文件中。
我还尝试将每个对象转换为pd.DataFrame({})语句中的列表,但这并没有更改csv输出。
代码
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
my_url = 'https://www.zoopla.co.uk/for-sale/property/b23/?page_size=100&q=B23&radius=0&results_sort=newest_listings&search_source=refine'
res = requests.get(my_url)
soup = BeautifulSoup(res.text, "html.parser")
lis = soup.find("ul", class_="listing-results clearfix js-gtm-list").find_all("li", class_="srp clearfix")
for li in lis:
bedrooms = li.find("span", class_="num-beds")
bathrooms = li.find("span", class_="num-baths")
price = li.find("a", class_="text-price")
house_price = re.findall('\£(\d+)', str(price))
style = li.find("h2", class_="listing-results-attr")
house_type = re.findall('(?<=bed ).*(?= for)', str(style))
distance = li.find("li", class_="clearfix")
station_distance = re.findall('\d+\.?\d*', str(distance))
if bedrooms:
bedrooms = bedrooms.get_text(strip=True)
if bathrooms:
bathrooms = bathrooms.get_text(strip=True)
if house_price:
house_price = house_price
if house_type:
house_type = house_type
if station_distance:
station_distance = station_distance
df = pd.DataFrame({'house_price': house_price, 'house_type': house_type, 'station_distance': station_distance, 'bedrooms': bedrooms, 'bathrooms': bathrooms})
print(df)
df.to_csv('zoopla.csv')
输出
house_price house_type station_distance bedrooms bathrooms
0 90 flat 0.2 1 1
house_price house_type station_distance bedrooms bathrooms
0 210 detached house 0.6 3 None
house_price house_type station_distance bedrooms bathrooms
0 160 end terrace house 0.7 2 1
house_price house_type station_distance bedrooms bathrooms
0 325 detached house 1.2 4 1
house_price house_type station_distance bedrooms bathrooms
0 195 semi-detached house 1.1 3 1
house_price house_type station_distance bedrooms bathrooms
0 24 terraced house 0.9 3 None
house_price house_type station_distance bedrooms bathrooms
0 115 flat 0.2 2 1
答案 0 :(得分:1)
您每次迭代都覆盖数据框。
使用:
result = []
for li in lis:
...
result.append({'house_price': house_price, 'house_type': house_type, 'station_distance': station_distance, 'bedrooms': bedrooms, 'bathrooms': bathrooms})
df = pd.DataFrame(result)
print(df)
df.to_csv('zoopla.csv')