如何将此信息放在列中?

时间:2019-11-11 00:03:24

标签: python beautifulsoup python-3.7

我的代码正在运行。但是我需要列中的信息。有人可以帮助我吗?我先谢谢你。

from bs4 import BeautifulSoup
import csv


#Request webpage content
result = requests.get('https://www.solar.com/learn/solar-panel-cost/')

#Save content in var
src = result.content

#soupactivate
soup = BeautifulSoup(src,'lxml')


#Open CSV
file = open('priceperwatt','w')
writer = csv.writer(file)

for tr in soup.findAll('tr'):
    rowtext = tr.get_text()
    writer.writerow([rowtext])

file.close()

1 个答案:

答案 0 :(得分:1)

因此,我对您的代码进行了一些改进。主要问题是要抓取的数据不适合数组,因为前几行包含的元素数不相同。但是一旦进入['状态','每瓦市场价格','Solar.com每瓦价格'],您就可以将其用作列标题。我的更改包括修改您的csv读取器和写入器,以接受换行符kwarg,该分隔符将每一行分开。

from bs4 import BeautifulSoup
import requests
import csv


#Request webpage content
result = requests.get('https://www.solar.com/learn/solar-panel-cost/')

#Save content in var
src = result.content

#soupactivate
soup = BeautifulSoup(src,'lxml')


#Open CSV
with open('priceperwatt','w', newline='') as file:
    writer = csv.writer(file)

    for tr in soup.findAll('tr'):
        rowtext = tr.get_text()
        writer.writerow([rowtext])

with open('priceperwatt','r', newline='') as file:
    reader = csv.reader(file)
    for row in reader:
        row = ''.join(row).strip('\n').split('\n')
        print(row)

输出:

['Solar Price Per Watt', 'Solar Price Per Kilowatt Hour']
['GROSS system cost / Total system wattage', 'NET system cost / Total lifetime system production']
['Useful for comparing solar quotes against one another', 'Useful for comparing solar versus utility bill']
['Pertains to the POWER of a system', 'Pertains to the PRODUCTION of a system']
['Typically $3.00-4.00/watt', 'Typically $0.06-0.08/kWh']
['State', 'Market Price Per Watt', 'Solar.com Price Per Watt']
['Arizona', '$3.61/W', '$3.39/W']
['California', '$4.31/W', '$3.76/W']
['Connecticut', '$3.65/W', '$3.68/W']
['Florida', '$3.45/W', '$2.82/W']
['Massachusetts', '$4.18/W', '$3.92/W']
['Maryland', '$3.93/W', '$3.64/W']
['Minnesota', '$4.61/W', '$3.66/W']
['New Hampshire', '$3.72/W', '$3.37/W']
['New Mexico', '$4.82/W', '$3.56/W']
['Oregon', '$3.79/W', '$3.68/W']
['Texas', '$3.83/W', '$3.17/W']
['Wisconsin', '$3.29/W', '$3.83/W']

最后:

import pandas as pd

lst = []
with open('priceperwatt','r', newline='') as file:
    reader = csv.reader(file)
    for row in reader:
        row = ''.join(row).strip('\n').split('\n')
        lst.append(row)

pd.DataFrame(lst[6:], columns=lst[5])