在python中进行Web抓取后,无法将数据转换为正确的格式

时间:2017-09-07 04:43:51

标签: web-scraping beautifulsoup python-3.6

我写了一个代码来搜索网站:https://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=-1&IsNodeId=1&Description=GTX&bop=And&Page=     {}&安培;每页= 36&安培;为了= BESTMATCH” .format(页)

但是当我运行此代码时,数据不会被格式化,就像产品名称会出现在单元格中一样,价格和图像也是如此。

from urllib.request import urlopen
from bs4 import BeautifulSoup
f = open("Scrapedetails.csv", "w")
Headers = "Item_Name, Price, Image\n"
f.write(Headers)

for page in range(1,15):
    page_url = "https://www.newegg.com/Product/ProductList.aspx?
Submit=ENE&N=-1&IsNodeId=1&Description=GTX&bop=And&Page=
{}&PageSize=36&order=BESTMATCH".format(page)
    html = urlopen(page_url)
    bs0bj = BeautifulSoup(html, "html.parser")
    page_details = bs0bj.find_all("div", {"class":"item-container"})
    for i in page_details:
         Item_Name = i.find("a", {"class":"item-title"})
         Price = i.find("li", {"class":"price-current"})
         Image = i.find("img")
         Name_item = Item_Name.get_text()
         Prin = Price.get_text()
         imgf = Image["src"]# to get the key src 
         f.write("{}".format(Name_item).strip()+ ",{}".format(Prin).strip()+ 
  ",{}".format(imgf)+ "\n")
f.close()

有人可以帮我修改代码,以便我可以在名称列中获取名称,价格列中的价格和图像列中的图像。 什么是在csv中保存数据的新方法,有人可以用代码帮助我吗?

1 个答案:

答案 0 :(得分:0)

好吧,我解决了。

from urllib.request import urlopen
from bs4 import BeautifulSoup

f = open("Scrapedetails.csv", "w")
Headers = "Item_Name, Price, Image\n"
f.write(Headers)

for page in range(1,15):
    page_url = "https://www.newegg.com/Product/ProductList.aspx?
Submit=ENE&N=-1&IsNodeId=1&Description=GTX&bop=And&Page=
{}&PageSize=36&order=BESTMATCH".format(page)
    html = urlopen(page_url)
    bs0bj = BeautifulSoup(html, "html.parser")
    page_details = bs0bj.find_all("div", {"class":"item-container"})
    for i in page_details:
        Item_Name = i.find("a", {"class":"item-title"})
        Price = i.find("li", {"class":"price-current"}).find('strong')
        Image = i.find("img")
        Name_item = Item_Name.get_text().strip()
        prin = Price.get_text()
        imgf = Image["src"]# to get the key src 


        print(Name_item)
        print(prin)
        print('https:{}'.format(imgf))
        f.write("{}".format(Name_item).replace(",", "|")+ ",{}".format(prin)+ ",https:{}".format(imgf)+ "\n")
f.close()

这些是希望以最简单的方式开始网页编排的人的代码