如何刮取多个页面并将数据写入Excel

时间:2017-08-29 09:20:58

标签: web-scraping python-3.6

如何在excel中覆盖多个页面。 例如,我想刮“http://econpy.pythonanywhere.com/ex/001.html” 如何在考虑页数未知的情况下刮下一页

另外,我已经编写了一个代码,它在excel中打印非类型但不是数据

from bs4 import BeautifulSoup
from urllib.request import urlopen

page_url = "http://econpy.pythonanywhere.com/ex/001.html"
new_file = "Mynew.csv"
f = open(new_file, "w")
Headers = "Header1, Header2\n"
f.write(Headers)


html = urlopen(page_url)
soup = BeautifulSoup(html, "html.parser")
buyer_info = soup.find_all("div", {"title":"buyer-info"})
for i in buyer_info:
    Header1 = i.find("div", {"title":"buyer-name"})
    Header2 = i.find("span", {"class":"item-price"})
    salmon = print(Header1.get_text())
    salam = print(Header2.get_text())
    f.write("{}".format(salmon)+ "{}".format(salam))
f.close()

我做错了什么?

2 个答案:

答案 0 :(得分:0)

我把它解决了,直到第1页......这就是代码

from bs4 import BeautifulSoup
from urllib.request import urlopen

page_url = "http://econpy.pythonanywhere.com/ex/001.html"
new_file = "Mynew.csv"
f = open(new_file, "w")
Headers = "Header1,Header2\n"
f.write(Headers)

html = urlopen(page_url)
soup = BeautifulSoup(html, "html.parser") 
buyer_info = soup.find_all("div", {"title":"buyer-info"})
for i in buyer_info:
    Header1 = i.find("div", {"title":"buyer-name"})
    Header2 = i.find("span", {"class":"item-price"})
    f.write('{},{}\n'.format(Header1.text, Header2.text))
f.close()

现在痛苦来了如何为多页翻页,这意味着如何刮下一页呢?

答案 1 :(得分:0)

尝试一下,如果您有任何问题,请告诉我。我使用了" css选择器"和"请求"要完成的操作。

import csv ; import requests
from bs4 import BeautifulSoup

outfile = open('Mynew.csv', 'w',newline='')
writer = csv.writer(outfile)
writer.writerow(["Name","Price"])

for page in range(1,6):
    html = requests.get("http://econpy.pythonanywhere.com/ex/00{0}.html".format(page))
    soup = BeautifulSoup(html.text, "html.parser")
    for item in soup.select("div[title=buyer-info]"):
        Header1 = item.select_one("div[title=buyer-name]").get_text()
        Header2 = item.select_one("span.item-price").get_text()
        writer.writerow([Header1, Header2])
        print(Header1,Header2)
outfile.close()