我使用BeautifulSoup在python中的一个表中删除了这些表。代码如下:
import urllib2
from bs4 import BeautifulSoup
for i in range(0,39):
first=urllib2.urlopen("http://www.admision.unmsm.edu.pe/res20130914/A/011/"+str(i)+".html").read()
soup=BeautifulSoup(first)
for tr in soup.find_all('tr')[2:]:
tds = tr.find_all('td')
print tds[0].text, tds[1].text, tds[2].text, tds[3].text
结果是这样的:
494560 ABAD SAAVEDRA, GERSON HORACIO 011 1116.8750
455314 ABAD VALVERDE, MARIA ISABEL 011 1482.7500
491005 ABREGU HUAMAN, MERCEDES LILIANA 011 503.4000
457929 ACOSTA ABAD, ALEJANDRO FRANCISCO 011 413.0500
那么,如何将此表导出为CSV?
答案 0 :(得分:3)
使用csv
模块:
import csv
import urllib2
from bs4 import BeautifulSoup
with open('listing.csv', 'wb') as f:
writer = csv.writer(f)
for i in range(39):
url = "http://www.admision.unmsm.edu.pe/res20130914/A/011/{}.html".format(i)
u = urllib2.urlopen(url)
try:
html = u.read()
finally:
u.close()
soup=BeautifulSoup(html)
for tr in soup.find_all('tr')[2:]:
tds = tr.find_all('td')
row = [elem.text.encode('utf-8') for elem in tds[:4]]
writer.writerow(row)