Speeding up data scraping and mysql inserting

时间:2018-04-20 21:29:03

标签: mysql python-3.x xpath web-scraping

I am currently scraping a website tat provides a table of data. The structure would be as follows

<table>
 <tr> #This is the first row
   <td> data 1 </td>
   .....
 </tr>
 ....
</table>

Let's say in the end there is a table with 20 rows and 10 columns. My script has to go from one table to the next, being between 100 and 1000 tables.

So, with xpath I locate each row, insert its data in 2 tables, and go to the next one. A pseudocode would be

for table in tables: #Between 100 and 1000 tables
  for row in table:
    Here I get from the row each td tag and returns a list
    Insert in table 1 half of the data, and get the id of the row inserted
    insert in table 2 the other half with the id of the first table row, to link both.

I´ve been timing it to see why and where this takes that long and I got the following

Overall table time 16 seconds
  Getting the data and generating the list for one row 0,453 secs

  Inserting data in table 1 0,006 secs
  Inserting data in table 2 0,0067 secs

This means that if I have to scrape all 1000 tables that would take me more than 10 hours, which is way too much time, considering that when I used beautiful soup overall time was between half an hour and 1:30h.

Seeing the problem is in getting text data from each td tag in each row, is there any way to speed it up? Esentially what I am doing in that part of the script is

data_in_position_1=row.find_element_by_xpath('.//td[1]').text
.....
data_in_position_15=row.find_element_by_xpath('.//td[15]').text

list=[data_in_position_1,.....,data_in_position_15]

return list

Well, I don´t know if scraping the whole table at once, or a different approach will show different results, but I need some way to speed this up.

Thanks

0 个答案:

没有答案