这可能很简单,但我对python很新,我甚至无法弄清楚从哪里开始。
所以我写了一个代码,成功地从网页上抓取了我想要的数据。现在我的问题是我不知道如何将它导出到csv,这就是我的代码看起来的样子。
import requests
import csv
from bs4 import BeautifulSoup
for numb in range(1, 3):
urls= "http://www.blocket.se/bostad/uthyres?cg_multi=3020&cg_multi=3100&cg_multi=3120&cg_multi=3060&cg_multi=3070&sort=&ss=&se=&ros=&roe=&bs=&be=&mre=&q=&q=&q=&save_search=1&l=0&md=th&o=" +str(numb) +"&f=p&f=c&f=b&ca=11&w=3"
r = requests.get(urls)
soup=BeautifulSoup(r.text, 'html.parser')
data = soup.find_all("div", {"itemtype": "http://schema.org/Offer"})
for item in data:
try:
print item.contents[3].find_all("span", {"class": "subject-param category"})[0].text
except:
pass
try:
print item.contents[3].find_all("span", {"class": "subject-param address separator"})[0].text
except:
pass
try:
print item.contents[3].find_all("span", {"class": "li_detail_params first rooms"})[0].text
except:
pass
try:
print item.contents[3].find_all("span", {"class": "li_detail_params monthly_rent"})[0].text
except:
pass
try:
print item.contents[3].find_all("span", {"class": "li_detail_params size"})[0].text
except:
pass
try:
print item.contents[3].find_all("span", {"class": "li_detail_params first weekly_rent_offseason"})[0].text
except:
pass
它打印出来:
lägenhet
Stockholms stad - Bromma
1 rum
4 000 kr/mån
villa
Linköping
100 m²
lägenhet
Stockholms stad - Maria, Gamla Stan, Högalid
1 rum
8 000 kr/mån
36 m²
lägenhet
Stockholms stad - Hägersten, Liljeholmen
1 rum
7 500 kr/mån
26 m²
当然它不是最好的输出,但我并不是真正关心它。现在,有人能指出我将如何将其导出到csv吗?正如我所说,我甚至不知道从哪里开始。
答案 0 :(得分:0)
将您的信息添加到列表中,而不是打印语句。最后使用csv.writer将其吐出到控制台:
import unicodecsv as csv
from bs4 import BeautifulSoup
import requests
import StringIO
for numb in range(1, 3):
urls= "http://www.blocket.se/bostad/uthyres?cg_multi=3020&cg_multi=3100&cg_multi=3120&cg_multi=3060&cg_multi=3070&sort=&ss=&se=&ros=&roe=&bs=&be=&mre=&q=&q=&q=&save_search=1&l=0&md=th&o=" +str(numb) +"&f=p&f=c&f=b&ca=11&w=3"
r = requests.get(urls)
soup=BeautifulSoup(r.text, 'html.parser')
data = soup.find_all("div", {"itemtype": "http://schema.org/Offer"})
data_list = []
for item in data:
data_item = {}
try:
data_item['category'] = item.contents[3].find_all("span", {"class": "subject-param category"})[0].text
except:
pass
try:
data_item['address separator'] = item.contents[3].find_all("span", {"class": "subject-param address separator"})[0].text
except:
pass
try:
data_item['first rooms'] = item.contents[3].find_all("span", {"class": "li_detail_params first rooms"})[0].text
except:
pass
try:
data_item['monthly_rent'] = item.contents[3].find_all("span", {"class": "li_detail_params monthly_rent"})[0].text
except:
pass
try:
data_item['size'] = item.contents[3].find_all("span", {"class": "li_detail_params size"})[0].text
except:
pass
try:
data_item['weekly_rent_offseason'] = item.contents[3].find_all("span", {"class": "li_detail_params first weekly_rent_offseason"})[0].text
except:
pass
data_list.append(data_item)
out = StringIO.StringIO()
csv_writer = csv.writer(out)
[csv_writer.writerow(data.values()) for data in data_list]
print out.getvalue()
除了基本系统之外,您还需要安装以下库:
这确实为我吐出了CSV,如果它不适合你,请告诉我。