我的代码正在运行。但是我需要列中的信息。有人可以帮助我吗?我先谢谢你。
from bs4 import BeautifulSoup
import csv
#Request webpage content
result = requests.get('https://www.solar.com/learn/solar-panel-cost/')
#Save content in var
src = result.content
#soupactivate
soup = BeautifulSoup(src,'lxml')
#Open CSV
file = open('priceperwatt','w')
writer = csv.writer(file)
for tr in soup.findAll('tr'):
rowtext = tr.get_text()
writer.writerow([rowtext])
file.close()
答案 0 :(得分:1)
因此,我对您的代码进行了一些改进。主要问题是要抓取的数据不适合数组,因为前几行包含的元素数不相同。但是一旦进入['状态','每瓦市场价格','Solar.com每瓦价格'],您就可以将其用作列标题。我的更改包括修改您的csv读取器和写入器,以接受换行符kwarg,该分隔符将每一行分开。
from bs4 import BeautifulSoup
import requests
import csv
#Request webpage content
result = requests.get('https://www.solar.com/learn/solar-panel-cost/')
#Save content in var
src = result.content
#soupactivate
soup = BeautifulSoup(src,'lxml')
#Open CSV
with open('priceperwatt','w', newline='') as file:
writer = csv.writer(file)
for tr in soup.findAll('tr'):
rowtext = tr.get_text()
writer.writerow([rowtext])
with open('priceperwatt','r', newline='') as file:
reader = csv.reader(file)
for row in reader:
row = ''.join(row).strip('\n').split('\n')
print(row)
输出:
['Solar Price Per Watt', 'Solar Price Per Kilowatt Hour']
['GROSS system cost / Total system wattage', 'NET system cost / Total lifetime system production']
['Useful for comparing solar quotes against one another', 'Useful for comparing solar versus utility bill']
['Pertains to the POWER of a system', 'Pertains to the PRODUCTION of a system']
['Typically $3.00-4.00/watt', 'Typically $0.06-0.08/kWh']
['State', 'Market Price Per Watt', 'Solar.com Price Per Watt']
['Arizona', '$3.61/W', '$3.39/W']
['California', '$4.31/W', '$3.76/W']
['Connecticut', '$3.65/W', '$3.68/W']
['Florida', '$3.45/W', '$2.82/W']
['Massachusetts', '$4.18/W', '$3.92/W']
['Maryland', '$3.93/W', '$3.64/W']
['Minnesota', '$4.61/W', '$3.66/W']
['New Hampshire', '$3.72/W', '$3.37/W']
['New Mexico', '$4.82/W', '$3.56/W']
['Oregon', '$3.79/W', '$3.68/W']
['Texas', '$3.83/W', '$3.17/W']
['Wisconsin', '$3.29/W', '$3.83/W']
最后:
import pandas as pd
lst = []
with open('priceperwatt','r', newline='') as file:
reader = csv.reader(file)
for row in reader:
row = ''.join(row).strip('\n').split('\n')
lst.append(row)
pd.DataFrame(lst[6:], columns=lst[5])