不同列中的单独Python Web抓取数据(Excel)

时间:2018-07-18 16:14:42

标签: python excel web-scraping multiple-columns export-to-csv

尊敬的Stackoverflow社区,

最近我开始玩Python。在观看YouTube视频和浏览该平台方面,我学到了很多东西。但是我无法解决我的问题。

希望你们能帮助我。

因此,我尝试使用Python(Anaconda)从网站上抓取信息。并将此信息放入CSV文件中。我试图通过在脚本中添加“,”来分隔列。但是,当我打开CSV文件时,所有数据都放在1​​列(A)中。相反,我希望将数据分为不同的列(A和B(当我想添加信息时,还有C,D,E,F等))。

我必须在此代码中添加什么:

filename = "brands.csv"
f = open(filename, "w")

headers = "brand, shipping\n"

f.write(headers)

for container in containers:
    brand_container = container.findAll("h2",{"class":"product-name"})
    brand = brand_container[0].a.text

    shipping_container = container.findAll("p",{"class":"availability in-stock"})
    shipping = shipping_container[0].text.strip()

    print("brand: " + brand)
    print("shipping: " + shipping)

    f.write(brand + "," + shipping +  "," + "\n")

f.close()

感谢您的帮助!

亲切的问候,


在Game0ver的建议下完成脚本:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.scraped-website.com'

# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parsing
page_soup = soup(page_html, "html.parser")   

# grabs each product
containers = page_soup.findAll("li",{"class":"item last"})
container = containers[0]

import csv

filename = "brands.csv"
with open(filename, 'w') as csvfile:
    fieldnames = ['brand', 'shipping']
    # define your delimiter
    writer = csv.DictWriter(csvfile, delimiter=',', fieldnames=fieldnames)
    writer.writeheader()

for container in containers:
    brand_container = container.findAll("h2",{"class":"product-name"})
    brand = brand_container[0].a.text

    shipping_container = container.findAll("p",{"class":"availability in-stock"})
    shipping = shipping_container[0].text.strip()

    print("brand: " + brand)
    print("shipping: " + shipping)

正如我提到的那样,此代码无效。我一定做错了吗?

3 个答案:

答案 0 :(得分:1)

您最好使用python's csv module来做到这一点:

import csv

filename = "brands.csv"
with open(filename, 'w') as csvfile:
    fieldnames = ['brand', 'shipping']
    # define your delimiter
    writer = csv.DictWriter(csvfile, delimiter=',', fieldnames=fieldnames)
    writer.writeheader()
    # write rows...

答案 1 :(得分:0)

尝试将值用双引号引起来,例如

f.write('"'+brand + '","' + shipping +  '"\n')

尽管有更多更好的方法来处理此通用任务和此功能。

答案 2 :(得分:0)

您可以选择以下两种显示方式之一。由于您脚本中的网址不可用,因此我提供了一个有效的网址。

import csv
import requests
from bs4 import BeautifulSoup

url = "https://yts.am/browse-movies"

response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')

with open("movieinfo.csv", 'w', newline="") as f:
    writer = csv.DictWriter(f, ['name', 'year'])
    writer.writeheader()

    for row in soup.select(".browse-movie-bottom"):
        d = {}
        d['name'] = row.select_one(".browse-movie-title").text
        d['year'] = row.select_one(".browse-movie-year").text
        writer.writerow(d)

或者您可以尝试以下操作:

soup = BeautifulSoup(response.content, 'lxml')

with open("movieinfo.csv", 'w', newline="") as f:
    writer = csv.writer(f)
    writer.writerow(['name','year'])

    for row in soup.select(".browse-movie-bottom"):
        name = row.select_one(".browse-movie-title").text
        year = row.select_one(".browse-movie-year").text
        writer.writerow([name,year])