使用beautifulsoup3将网络网址结果导出到CSV时出现问题

时间：2019-01-08 13:47:20

标签： python-3.x web-scraping beautifulsoup export-to-csv

问题：我尝试将结果（名称，地址，电话）导出为CSV，但是CSV代码未返回预期结果。

#Import the installed modules
import requests
from bs4 import BeautifulSoup
import json
import re
import csv

#To get the data from the web page we will use requests get() method
url = "https://www.lookup.pk/dynamic/search.aspx?searchtype=kl&k=gym&l=lahore"
page = requests.get(url)

# To check the http response status code
print(page.status_code)

#Now I have collected the data from the web page, let's see what we got
print(page.text)

#The above data can be view in a pretty format by using beautifulsoup's prettify() method. For this we will create a bs4 object and use the prettify method
soup = BeautifulSoup(page.text, 'lxml')
print(soup.prettify())

#Find all DIVs that contain Companies information
product_name_list = soup.findAll("div",{"class":"CompanyInfo"})

#Find all Companies Name under h2tag
company_name_list_heading = soup.findAll("h2")

#Find all Address on page Name under a tag
company_name_list_items = soup.findAll("a",{"class":"address"})

#Find all Phone numbers on page Name under ul
company_name_list_numbers = soup.findAll("ul",{"class":"submenu"})

创建for循环以打印出所有公司数据

for company_address in company_name_list_items:
    print(company_address.prettify())

# Create for loop to print out all company Names
for company_name in company_name_list_heading:
    print(company_name.prettify())

# Create for loop to print out all company Numbers
for company_numbers in company_name_list_numbers:
    print(company_numbers.prettify())

下面是将结果（名称，地址和电话号码）导出为CSV的代码

    outfile = open('gymlookup.csv','w', newline='')

writer = csv.writer(outfile)

writer.writerow(["name", "Address", "Phone"])

product_name_list = soup.findAll("div",{"class":"CompanyInfo"})
company_name_list_heading = soup.findAll("h2")
company_name_list_items = soup.findAll("a",{"class":"address"})
company_name_list_numbers = soup.findAll("ul",{"class":"submenu"})

这是用于循环访问数据的for循环。

for company_name in company_name_list_heading:
    names = company_name.contents[0]

for company_numbers in company_name_list_numbers:
    names = company_numbers.contents[1]

for company_address in company_name_list_items:
    address = company_address.contents[1]

    writer.writerow([name, Address, Phone])

outfile.close()

1 个答案:

答案 0 :(得分：1)

您需要进一步了解for循环的工作原理，以及字符串，变量和其他数据类型之间的区别。您还需要继续使用从其他stackoverflow问题中看到的内容，并学习将其应用。这与您已经发布的其他2个问题基本相同，但只是您要从中抓取的其他网站（但我没有将其标记为重复站点，因为您是stackoverflow和web scrpaing的新手，我记得就像是尝试学习）。我仍然会回答您的问题，但是最终您需要能够自己找到答案并学习如何适应和应用（编码不是按颜色绘制的。我确实看到您正在对其中的一些进行适应。在找到“ div”，{“ class”：“ CompanyInfo”}标签以获取公司信息方面做得很好。

您要提取的数据（名称，地址，电话）必须位于div class = CompanyInfo元素/标签的嵌套循环内。从理论上讲，您可以将它们放入列表中，然后从列表中写入csv文件，从理论上讲，现在拥有它，但是存在数据丢失的风险，然后使用更正相应的公司。

这是完整代码的样子。请注意，变量与一起存储在循环中，然后写入。然后转到CompanyInfo的下一个块，然后继续。

#Import the installed modules
import requests
from bs4 import BeautifulSoup
import csv

#To get the data from the web page we will use requests get() method
url = "https://www.lookup.pk/dynamic/search.aspx?searchtype=kl&k=gym&l=lahore"
page = requests.get(url)

# To check the http response status code
print(page.status_code)

#Now I have collected the data from the web page, let's see what we got
print(page.text)

#The above data can be view in a pretty format by using beautifulsoup's prettify() method. For this we will create a bs4 object and use the prettify method
soup = BeautifulSoup(page.text, 'html.parser')
print(soup.prettify())

outfile = open('gymlookup.csv','w', newline='')
writer = csv.writer(outfile)
writer.writerow(["Name", "Address", "Phone"])


#Find all DIVs that contain Companies information
product_name_list = soup.findAll("div",{"class":"CompanyInfo"})

# Now loop through those elements
for element in product_name_list:

    # Takes 1 block of the "div",{"class":"CompanyInfo"} tag and finds/stores name, address, phone
    name = element.find('h2').text
    address = element.find('address').text.strip()
    phone = element.find("ul",{"class":"submenu"}).text.strip()

    # writes the name, address, phone to csv
    writer.writerow([name, address, phone])

    # now will go to the next "div",{"class":"CompanyInfo"} tag and repeats     

outfile.close()