我正在迭代一个过程,在这个过程中,我将Python引导到一个网站,并指示Python在指定的网站中查找我的csv文件中的地址。我想告诉Python将网站中每个单独地址值的结果保存到csv文件中。
from selenium import webdriver
from bs4 import BeautifulSoup
import time
import csv
driver = webdriver.Chrome("C:\Python27\Scripts\chromedriver.exe")
chrome = driver.get('https://etrakit.friscotexas.gov/Search/permit.aspx')
with open('C:/Users/thefirstcolumnedited.csv','r') as f:
addresses = f.readlines()
for address in addresses:
driver.find_element_by_css_selector('#cplMain_txtSearchString').clear()
driver.find_element_by_css_selector('#cplMain_txtSearchString').send_keys(address)
driver.find_element_by_css_selector('#cplMain_btnSearch').click()
time.sleep(5)
soup = BeautifulSoup(chrome, 'html.parser')
writer = csv.writer(open('thematchingresults.csv', 'w'))
writer.writerow(soup)
例如:
6579 Mountain Sky Rd
上面的地址值从网站上检索五行数据。如何告诉Beautiful Soup为csv文件中的每个地址值保存结果?
答案 0 :(得分:1)
想法是写入循环内的CSV文件(如果要为所有输入地址生成单个csv
文件,请使用a
“追加”模式。至于提取结果,我explicitly wait(time.sleep()
不可靠且通常比它应该更慢)结果table
元素(带id="ctl00_cplMain_rgSearchRslts_ctl00"
的元素) ,然后使用pandas.read_html()
将table
读入数据框,然后通过.to_csv()
方便地将其转储到CSV文件中:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# ...
wait = WebDriverWait(driver, 10)
for address in addresses:
driver.find_element_by_css_selector('#cplMain_txtSearchString').clear()
driver.find_element_by_css_selector('#cplMain_txtSearchString').send_keys(address)
driver.find_element_by_css_selector('#cplMain_btnSearch').click()
# wait for the results table
table = wait.until(EC.visibility_of_element_located((By.ID, "ctl00_cplMain_rgSearchRslts_ctl00")))
# make a dataframe and dump the results
df = pd.read_html(table.get_attribute("outerHTML"))[0]
with open('thematchingresults.csv', 'a') as f:
df.to_csv(f)
对于单个“6579 Mountain Sky Rd”地址,运行脚本后thematchingresults.csv
的内容将为:
,Permit Number,Address,Street Name,Applicant Name,Contractor Name,SITE_SUBDIVISION,RECORDID
0,B13-2809,6579 MOUNTAIN SKY RD,MOUNTAIN SKY RD,SHADDOCK HOMES LTD,SHADDOCK HOMES LTD,PCR - SHERIDAN,MAC:1308050328358768
1,B13-4096,6579 MOUNTAIN SKY RD,MOUNTAIN SKY RD,MIRAGE CUSTOM POOLS,MIRAGE CUSTOM POOLS,PCR - SHERIDAN,MAC:1312030307087756
2,L14-1640,6579 MOUNTAIN SKY RD,MOUNTAIN SKY RD,TDS IRRIGATION,TDS IRRIGATION,SHERIDAN,ECON:140506012624706
3,P14-0018,6579 MOUNTAIN SKY RD,MOUNTAIN SKY RD,MIRAGE CUSTOM POOLS,,SHERIDAN,LCR:1401130949212891
4,ROW14-3205,6579 MOUNTAIN SKY RD,MOUNTAIN SKY RD,Housley Group,Housley Group,,TLW:1406190424422330
希望这是一个很好的起点。