Looping through items and saving to .xlsx file, only saves the last value using web scraping?

时间:2017-01-15 12:27:29

标签: python excel web-scraping openpyxl

I am very new to python. Trying to learn as much as I can while doing projects to maintain interest levels.

In the code below, I am trying to scrape information from a website and get all the Company names and address etc into an excel file. I think I need to define how the excel rows and columns need to be assigned for each of the iterations/companies. I am just drawing a blank on how exactly to go about it.

import requests, os
from bs4 import BeautifulSoup
from openpyxl import Workbook
from openpyxl import load_workbook


url = "https://dir.indiamart.com/search.mp?ss=Power+Distribution+Transformers"
r = requests.get(url)

soup = BeautifulSoup(r.content)

links = soup.find_all("a")

for link in links:
    print("<a href='%s'>%s</a>" % (link.get("href"), link.text))


g_data = soup.find_all("div", {"class": "nes"})

c = []
d = []
for item in g_data:
    c.append(item.contents[3].text)
    d.append(item.contents[1].text)
    wb = load_workbook("Trial.xlsx")
    ws1 = wb.get_sheet_by_name("Sheet1")
    for i in c:
        ws1["A2"] = i
        wb.save("Trial.xlsx")
        for x in d:
            ws1["B2"] = x
            wb.save("Trial.xlsx")

1 个答案:

答案 0 :(得分:1)

import requests, bs4, re, csv

url = 'https://dir.indiamart.com/search.mp?ss=Power+Distribution+Transformers'
r = requests.get(url)
soup = bs4.BeautifulSoup(r.text, 'lxml')
blocks = soup.find_all('div', class_='lst')

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    for b in blocks:
        name = b.find(class_='cnm').get_text(strip=True)
        addr = b.find(class_='clg').get_text(strip=True)
        call = b.find(class_='ls_co phn').find(text=re.compile('\d+')).strip()
        writer.writerow([name, addr, call])

出:

"Padmavahini Transformers Private Limited, Coimbatore","Saravanampatti, CoimbatoreS. F. No. 353/1, Door No. 7/140, Ruby Matriculation School Road Keeranatham, Saravanampatti,Coimbatore-641035,Tamil Nadu",8071681548
Guru Teg Bahadur Metal Works,"Shimlapuri, LudhianaNo. 1621, Street No. 4, Kwality Road, Near Kwality Chowk Shimlapuri,Ludhiana-141003,Punjab",8079452881
Servokon Systems Ltd.,"Servokon House, New DelhiServokon House, C-13, Radhu Palace Road Opposite Scope Minar,New Delhi-110092,Delhi",8048077499
Muskaan Power Infrastructure Ltd,"Dhandari Kalan, LudhianaSua Road, Industrial Area - C, Dhandari Kalan,Ludhiana-141014,Punjab",8079465606
Tamilnadu Electricals,"Ambattur Industrial Estate, ChennaiNo. 95 - H, (SP) Ambattur Industrial Estate,Chennai-600058,Tamil Nadu",8046073728
L. D. Power Transformers Pvt. Ltd.,"Sector 3, NoidaA-9, Sector- 59, Phase- 3,Noida-201301,Uttar Pradesh",8048111124
Western Electricals (pvt.) Ltd.,"Kaman, PalgharS. No. 6, H. No. 1, ( Part), Behind Shanti Metal, Near Sai Service, Vasai - Kaman Road Sativali Village, Taluka Vasai (E),Palghar-401208,Maharashtra",8071683491

您可以使用CSV文件存储数据,然后在Excel中打开它。 CSV模块易于使用。