将表数据刮入.csv

时间:2017-04-24 00:53:24

标签: python python-2.7 selenium web-scraping beautifulsoup

我可以打印出表格,但无法将其提取到.csv文件中。我是新手,每天都在学习和学习。

如何将此数据屏幕抓取到CSV文件?

标准库模块     进口口     import sys

# The wget module
import wget

# The BeautifulSoup module
from bs4 import BeautifulSoup

# The selenium module
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome() # if you want to use chrome, replace Firefox()    with Chrome()
driver.get("https://www.arcountydata.com/county.asp?county=Benton") # load the web page
search_begin = driver.find_element_by_xpath("//*[@id='Assessor']/div/div[2]/a/i").click()
# for websites that need you to login to access the information
elem = driver.find_element_by_id("OwnerName") # Find the email input field of the login form
elem.send_keys("Roth Family Inc") # Send the users email

search_exeute = driver.find_element_by_xpath("//*[@id='Search']").click()


src = driver.page_source # gets the html source of the page

parser = BeautifulSoup(src,"lxml") # initialize the parser and parse the source "src"


table = parser.find("table", attrs={"class" : "table table-striped-yellow     table-hover table-bordered table-condensed table-responsive"}) # A list of attributes that you want to check in a tag)

f = open('/output.csv', 'w')

parcel=""
owner=""
ptype=""
site_address=""
location=""
acres=""

summons =[]
#print table

list_of_rows = []
for row in table.findAll('tr')[1:]:
    list_of_cells = []
    for cell in row.findAll('td'):
        text = cell.text.replace(" ", "")
        list_of_cells.append(text)
    list_of_rows.append(list_of_cells)
print list_of_rows

driver.close() # closes the driver ?>

1 个答案:

答案 0 :(得分:1)

很高兴看到一位新成员,祝你好运! 你做了一个很棒的工作,你很好地评论了你的代码,这意味着你理解它,干得好!

import csv

这是python中的一个模块,可以轻松读取/写入CSV文件,所以我们先导入它。

  with open(name_csv+'.csv', 'w+') as csvfile:
      spamwriter = csv.writer(csvfile, delimiter=',')
      spamwriter.writerow(list_of_rows)
#Used to write 1 row, each element in the array will be seperated by a comma

编辑:

with open('somecsv.csv','w+') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter='|') # Changed the delimiter (Way of separating )
    # This opens the CSV file and we set some additional parameters
    for row in table.findAll('tr')[::2]:
        list_of_cell = []
        for cell in row.findAll('td')[:5]:
            text = cell.text.replace(" ", "").strip()
            text = text.replace('''...\n\n\n'''," |") #This one is added so it replaces string before Lot with comma
            text = text.replace(''':\n''',':') #This one is added so it doesn't interfere
            text = text.replace('''\n''','|') #Adds a comma before block
            list_of_cell.append(text)
        print(list_of_cell)
        spamwriter.writerow(list_of_cell)