我正在尝试从以html编写的网站上抓取信息。随附以下代码:
#Import packages
import urllib.request
from bs4 import BeautifulSoup
import pandas as pd
import csv
#For loop to scrap details of power plants
lst=[]
for i in range(1,46624):
pid=str(i)
url="http://www.globalenergyobservatory.com/form.php?pid=" + pid
page=urllib.request.urlopen(url)
soup=BeautifulSoup(page,'html.parser')
#Distinguish power plants to different types of primary fuel
types=soup.find(id="Type")
power_types=types["value"]
#No of units of power plant
unit=soup.find(id="Abstract_Block")
unit_breakdown_describe=unit.get_text()
#Name of power plant
name=soup.find(id="Name")
name_value=name["value"]
#Status of power plant
status1=soup.find(id="Status_of_Plant_enumfield_itf")
status2=status1.find(selected="selected")
status=status2["value"]
#Latitude & longitude of power plant
lat=soup.find(id="Latitude_Start")
latitude=lat["value"]
long=soup.find(id="Longitude_Start")
longitude=long["value"]
#Capacity of power plant
cap=soup.find(id="Design_Capacity_(MWe)_nbr")
capacity=cap["value"]
lst.append([name_value,status,power_types,capacity,latitude,longitude,unit_breakdown_describe])
df=pd.DataFrame(lst) #Convert to dataframe for storage
df.columns=['Name','Status','Type_of_power_plant','Capacity','Latitude','Longitude','no_of_units']
#Convert to csv file
df.to_csv('power.csv',sep='\t')
我正在尝试抓取信息并将其放入DataFrame
中,以转换为csv
文件。尽管在尝试运行各个值(例如print(capacity))时没有遇到任何错误,但是当我尝试转换为csv
文件时出现了错误。我了解在这方面也有类似的话题,但希望能对您有所帮助。
答案 0 :(得分:0)
我看到的唯一错误是将数据添加到列表的行没有正确缩进 它应该在循环内部,然后将csv正确填充
lst=[]
for i in range(1,46624):
# Your code here
#Capacity of power plant
cap=soup.find(id="Design_Capacity_(MWe)_nbr")
capacity=cap["value"]
lst.append([name_value,status,power_types,capacity,latitude,longitude,unit_breakdown_describe])