使用抓取的数据填充MySQL表

时间:2016-04-03 17:24:18

标签: python mysql web-scraping beautifulsoup

我使用的是Python 3,MySQL,Sequel Pro和BeautifulSoup。

简单地说,我想创建一个SQL表,然后将下载的数据插入到该数据中。

我已将此答案用作构建SQL部件Beautiful soup webscrape into mysql的模板,但它无法正常工作。

抛出错误:

line 86 finally:SyntaxError: invalid syntax

当我最后评论这个finally:时(看看代码的其余部分是否有效)我得到了:

InternalError: (1054, "Unknown column 'address' in 'field list'") 

我得到的另一个常见错误是:

ProgrammingError: (1146, "Table 'simple_scrape.simple3' doesn't exist", 虽然我无法记住我为完成此错误所做的确切更改。

最后 - 我不到四周前就开始学习编程(不只是Python,而是编程')如果你想知道为什么我做了一些愚蠢或低效的事情&' #39;几乎可以肯定,因为这是我开始工作的第一种方式! 请帮忙!

代码:



from selenium import webdriver

#Guess BER Number
for i in range(108053983,108053985):
    try:    
#        ber_try = 100000000 
        ber_try =+i
#Open page & insert BER Number
        browser = webdriver.Firefox()
        type(browser)
        browser.get('https://ndber.seai.ie/pass/ber/search.aspx')
        ber_send = browser.find_element_by_id('ctl00_DefaultContent_BERSearch_dfSearch_txtBERNumber')
        ber_send.send_keys(ber_try)
        
 #click search
        form = browser.find_element_by_id('ctl00_DefaultContent_BERSearch_dfSearch_Bottomsearch')
        form.click()
        

#click intermediate page
        form = browser.find_element_by_id('ctl00_DefaultContent_BERSearch_gridRatings_gridview_ctl02_ViewDetails')
        form.click()
               
#scrape the page
        import bs4
        
    
        
      
        soup = bs4.BeautifulSoup(browser.page_source)
        
        
        # First Section
        ber_dec = soup.find('fieldset', {'id':'ctl00_DefaultContent_BERSearch_fsBER'})
        
        
        address = ber_dec.find('div', {'id':'ctl00_DefaultContent_BERSearch_dfBER_div_PublishingAddress'})
        address = (address.get_text(', ').strip())
        print(address)
        
        
        date_issue = ber_dec.find('span', {'id':'ctl00_DefaultContent_BERSearch_dfBER_container_DateOfIssue'}) 
        date_issue = date_issue.get_text().strip()
        print(date_issue)
        
    except:  
        print('Invalid BER Number:', ber_try)
        browser.quit()
   
       
     #connecting to mysql       

  
    finally:
            import pymysql.cursors
            from pymysql import connect, err, sys, cursors
     
    #Making the connection
            connection = pymysql.connect(host = '127.0.0.1',
                                        port = 3306,
                                        user = 'root',
                                        passwd = 'root11',
                                        db = 'simple_scrape',
                                        cursorclass=pymysql.cursors.DictCursor);

            with connection.cursor() as cursor:
                sql= """CREATE TABLE `simple3`(
                (
                `ID` INT AUTO_INCREMENT NOT NULL,
                `address` VARCHAR( 200 ) NOT NULL,
                `date_issue` VARCHAR( 200 ) NOT NULL,
                
                PRIMARY KEY ( `ID` )
            )Engine = MyISAM)"""
        
                sql = "INSERT INTO `simple3` (`address`, `date_issue`) VALUES (%s, %s)"
                cursor.execute(sql, (address, date_issue))
            connection.commit()
    finally:
            connection.close()
    
    browser.quit()
        




1 个答案:

答案 0 :(得分:1)

问题: 并实际创建表

            sql= """CREATE TABLE simple3(
            (
            ID INT AUTO_INCREMENT NOT NULL,
            address VARCHAR( 200 ) NOT NULL,
            date_issue VARCHAR( 200 ) NOT NULL,

            PRIMARY KEY ( ID )
        )Engine = MyISAM)"""
// Added this line since your table was not being created.
            cursor.execute(sql)

            sql = "INSERT INTO simple3 (address, date_issue) VALUES (%s, %s)"
            cursor.execute(sql, (address, date_issue))