导出我的Web抓取结果来自Python

时间:2015-04-02 21:24:29

标签: python csv

这可能很简单,但我对python很新,我甚至无法弄清楚从哪里开始。

所以我写了一个代码,成功地从网页上抓取了我想要的数据。现在我的问题是我不知道如何将它导出到csv,这就是我的代码看起来的样子。

import requests
import csv
from bs4 import BeautifulSoup

for numb in range(1, 3):
    urls= "http://www.blocket.se/bostad/uthyres?cg_multi=3020&cg_multi=3100&cg_multi=3120&cg_multi=3060&cg_multi=3070&sort=&ss=&se=&ros=&roe=&bs=&be=&mre=&q=&q=&q=&save_search=1&l=0&md=th&o=" +str(numb) +"&f=p&f=c&f=b&ca=11&w=3"
    r = requests.get(urls)
    soup=BeautifulSoup(r.text, 'html.parser')
    data = soup.find_all("div", {"itemtype": "http://schema.org/Offer"})

    for item in data:
        try:
            print item.contents[3].find_all("span", {"class": "subject-param category"})[0].text
        except:
            pass
        try:
            print item.contents[3].find_all("span", {"class": "subject-param address separator"})[0].text
        except:
            pass
        try:
            print item.contents[3].find_all("span", {"class": "li_detail_params first rooms"})[0].text
        except:
            pass
        try:
            print item.contents[3].find_all("span", {"class": "li_detail_params monthly_rent"})[0].text
        except:
            pass
        try:
            print item.contents[3].find_all("span", {"class": "li_detail_params size"})[0].text  
        except:
            pass
        try:
            print item.contents[3].find_all("span", {"class": "li_detail_params first weekly_rent_offseason"})[0].text
        except:
            pass

它打印出来:

lägenhet

                Stockholms stad - Bromma

1 rum
4 000 kr/mån

            villa

                Linköping

100 m²

            lägenhet

                Stockholms stad - Maria, Gamla Stan, Högalid

1 rum
8 000 kr/mån
36 m²

            lägenhet

                Stockholms stad - Hägersten, Liljeholmen

1 rum
7 500 kr/mån
26 m²

当然它不是最好的输出,但我并不是真正关心它。现在,有人能指出我将如何将其导出到csv吗?正如我所说,我甚至不知道从哪里开始。

1 个答案:

答案 0 :(得分:0)

将您的信息添加到列表中,而不是打印语句。最后使用csv.writer将其吐出到控制台:

import unicodecsv as csv
from bs4 import BeautifulSoup
import requests
import StringIO



for numb in range(1, 3):


    urls= "http://www.blocket.se/bostad/uthyres?cg_multi=3020&cg_multi=3100&cg_multi=3120&cg_multi=3060&cg_multi=3070&sort=&ss=&se=&ros=&roe=&bs=&be=&mre=&q=&q=&q=&save_search=1&l=0&md=th&o=" +str(numb) +"&f=p&f=c&f=b&ca=11&w=3"



    r = requests.get(urls)
    soup=BeautifulSoup(r.text, 'html.parser')



    data = soup.find_all("div", {"itemtype": "http://schema.org/Offer"})

    data_list = []
    for item in data:
        data_item = {}
        try:
            data_item['category'] = item.contents[3].find_all("span", {"class": "subject-param category"})[0].text
        except:
            pass
        try:
            data_item['address separator'] =  item.contents[3].find_all("span", {"class": "subject-param address separator"})[0].text
        except:
            pass
        try:
            data_item['first rooms'] = item.contents[3].find_all("span", {"class": "li_detail_params first rooms"})[0].text
        except:
            pass
        try:
            data_item['monthly_rent'] = item.contents[3].find_all("span", {"class": "li_detail_params monthly_rent"})[0].text
        except:
            pass
        try:
            data_item['size'] = item.contents[3].find_all("span", {"class": "li_detail_params size"})[0].text
        except:
            pass
        try:
            data_item['weekly_rent_offseason'] = item.contents[3].find_all("span", {"class": "li_detail_params first weekly_rent_offseason"})[0].text
        except:
            pass
        data_list.append(data_item)
    out = StringIO.StringIO()
    csv_writer = csv.writer(out)
    [csv_writer.writerow(data.values()) for data in data_list]
    print out.getvalue()

除了基本系统之外,您还需要安装以下库:

  1. unicodecsv - 用于写入非ascii字符
  2. beautifulsoup4 - 用于HTML解析
  3. 请求 - 用于HTTP访问
  4. 这确实为我吐出了CSV,如果它不适合你,请告诉我。