将刮表导出为CSV

时间:2014-01-19 04:45:21

标签: python csv web-scraping scrape

我使用BeautifulSoup在python中的一个表中删除了这些表。代码如下:

import urllib2
from bs4 import BeautifulSoup
for i in range(0,39):
    first=urllib2.urlopen("http://www.admision.unmsm.edu.pe/res20130914/A/011/"+str(i)+".html").read()
    soup=BeautifulSoup(first)
    for tr in soup.find_all('tr')[2:]:
      tds = tr.find_all('td')
      print tds[0].text, tds[1].text, tds[2].text, tds[3].text

结果是这样的:

494560 ABAD SAAVEDRA, GERSON HORACIO 011 1116.8750
455314 ABAD VALVERDE, MARIA ISABEL 011 1482.7500
491005 ABREGU HUAMAN, MERCEDES LILIANA 011 503.4000
457929 ACOSTA ABAD, ALEJANDRO FRANCISCO 011 413.0500

那么,如何将此表导出为CSV?

1 个答案:

答案 0 :(得分:3)

使用csv模块:

import csv
import urllib2
from bs4 import BeautifulSoup

with open('listing.csv', 'wb') as f:
    writer = csv.writer(f)
    for i in range(39):
        url = "http://www.admision.unmsm.edu.pe/res20130914/A/011/{}.html".format(i)
        u = urllib2.urlopen(url)
        try:
            html = u.read()
        finally:
            u.close()
        soup=BeautifulSoup(html)
        for tr in soup.find_all('tr')[2:]:
            tds = tr.find_all('td')
            row = [elem.text.encode('utf-8') for elem in tds[:4]]
            writer.writerow(row)