Python美丽的汤刮多个表?

时间:2018-01-30 02:23:19

标签: python beautifulsoup

我正在使用Python和Beautiful Soup来抓取newegg网站并获取产品定价,名称和运费。但是,当我运行程序时,输出仅从网站发回第一个产品条目。任何人都能帮我解决我做错的事吗?

# import beautiful soup 4 and use urllib to import urlopen
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

# url where we will grab the product data
my_url = 'http://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?
Tpk=graphics%20card'

# open connection and grab the URL page information, read it, then close it
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# parse html from the page
page_soup = soup(page_html, "html.parser")

# find each product within the item-container class
containers = page_soup.findAll("div",{"class":"item-container"})

# write a file named products.csv with the data returned
filename = "products.csv"
f = open(filename, "w")

# create headers for products
headers = "price, product_name, shipping\n"

f.write("")

# define containers based on location on webpage and their DOM elements
for container in containers:
    price_container = container.findAll("li", {"class":"price-current"})
    price = price_container[0].text.strip("|")

    title_container = container.findAll("a", {"class":"item-title"})
    product_name = title_container[0].text

    shipping_container = container.findAll("li",{"class":"price-ship"})
    shipping = shipping_container[0].text.strip()

# print each product with the brand, product name and shipping cost
print("price: " + price)
print("product name: " + product_name)
print("shipping: " + shipping)

# when writing each section, add a comma, replace comma with pipe,
# add new line after shipping
f.write(price + "," + product_name.replace(",", "|") + "," + shipping + 
"\n")

f.close()

3 个答案:

答案 0 :(得分:1)

print和write语句应该放在for块中。

# define containers based on location on webpage and their DOM elements for container in containers:
For container in containers:
    price_container = container.findAll("li", {"class":"price-current"})
    price = price_container[0].text.strip("|")

    title_container = container.findAll("a", {"class":"item-title"})
    product_name = title_container[0].text

    shipping_container = container.findAll("li" {"class":"price-ship"})
    shipping = shipping_container[0].text.strip()

    # print each product with the brand, product name and 
    shipping cost
    print("price: " + price)
    print("product name: " + product_name)
    print("shipping: " + shipping)

    # when writing each section, add a comma, replace comma with pipe,
    # add new line after shipping
    f.write(price + "," + product_name.replace(",", "|") + "," + shipping +  "\n")

f.close()

答案 1 :(得分:0)

你可以试试这个:

Google.Apis.Discovery.v1

输出:

using Google.Apis.Discovery.v1;

答案 2 :(得分:0)

你需要另一个循环调用f.write(),或者在你的第一个for循环中写入。

你只是在写一个产品'到该文件,因为该行代码只执行一次。

最简单的解决方案是移动

f.write(price + "," + product_name.replace(",", "|") + "," + shipping + 
"\n")

之后

 shipping = shipping_container[0].text.strip()

请务必缩进以匹配其余的for循环内容。

帮自己一个忙,阅读python文档。 https://docs.python.org/3/