Question

我使用的是Python 3.4，我试图从下面的链接中删除底层数据并转储到.csv文件中。我目前正在使用BeautifulSoup，我的脚本中的前几行如下所示：

import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
htmlfile=urlopen("https://secure.moneygram.com/estimate")
soup=BeautifulSoup(htmlfile)
print (soup.prettify()[0:1000])

任何人都可以提供给我一些帮助

由于

Answer 1

如果您需要登录，则需要使用splinter（浏览器），如果您不需要它并且数据清晰，您可以使用find，findNext，findAll，find_by_name从html代码中提取数据， find_by_id，find_by_css ...... 例如：

    soop = htmltext.find('table',{"id":"noticeResults"}).findNext('tbody')

此代码使用id＆＃34; noticeResults＆＃34;提供表（tbody）中的数据。

Answer 2

你应该看看这个python BeautifulSoup parsing table

然后保存为csv：

data = [...] # your data coming from BS4
import csv
with open('csv_file.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile)
    for row in data
        writer.writerow(row)

Web中的Web Scraping从基础表中提取数据

2 个答案: