我正在尝试抓取一些隐藏的表(每页15个表),这些隐藏的表在单击箭头后会展开。 (我要附上图片:Unexpanded tables Expanded tables)
我也附上了HTML(抱歉,它有点长)
<table class="footable table toggle-arrow-tiny default breakpoint footable-loaded" transparenturl="Images/arrow_none.gif" ascendingurl="Images/arrow_up.gif" customsortdirection="Ascending" custompageindex="0" customsortfield="fullname" custompagealphaindex="A" custompagemode="ABC" custompagealpharelative="A" descendingurl="Images/arrow_down.gif" customvirtualcount="1605" id="MainContent_gw_partners" style="border-collapse:collapse;" cellspacing="0">
<thead>
<tr>
<th data-toggle="true" scope="col" class="footable-visible footable-first-column"> </th><th data-ignore="true" data-hide="phone, tablet" scope="col" class="footable-visible"> </th><th data-ignore="true" data-hide="phone, tablet" scope="col" class="footable-visible">Titolo </th><th scope="col" class="footable-visible">Cognome </th><th data-ignore="true" data-hide="phone, tablet" scope="col" class="footable-visible">NPA </th><th data-ignore="true" data-hide="phone" scope="col" class="footable-visible">Luogo </th><th data-ignore="true" data-hide="phone" scope="col" class="footable-visible footable-last-column">Cantone </th><th data-hide="all" scope="col" style="display: none;">Discipline(s) thérapeutique(s) </th><th data-hide="all" scope="col" style="display: none;">Società </th><th data-hide="all" scope="col" style="display: none;">Cognome </th><th data-hide="all" scope="col" style="display: none;">C/O </th><th data-hide="all" scope="col" style="display: none;">Via </th><th data-hide="all" scope="col" style="display: none;">NPA </th><th data-hide="all" scope="col" style="display: none;">Luogo </th><th data-hide="all" scope="col" style="display: none;">Tel / Cellulare </th><th data-hide="all" scope="col" style="display: none;">Cellulare </th><th data-hide="all" scope="col" style="display: none;">Fax </th><th data-hide="all" scope="col" style="display: none;">e-mail </th><th data-hide="all" scope="col" style="display: none;">Sito WEB </th><th data-hide="all" scope="col" style="display: none;">Altri luoghi di lavoro </th><th data-hide="all" scope="col" style="display: none;">Discipline(s) thérapeutique(s) </th>
</tr>
</thead><tbody>
<tr class="row_white footable-detail-show">
<td class="footable-visible footable-first-column"><span class="footable-toggle"></span> </td><td class="footable-visible">
</td><td class="footable-visible"> </td><td class="footable-visible">
ABBONDANZIERI Katia
</td><td class="footable-visible">
1204
<br>
</td><td class="footable-visible">
Genève
<br>
</td><td class="footable-visible footable-last-column">
GE
<br>
</td><td style="display: none;">
197. Omeopatia, 202. Linfodrenaggio manuale, 205. Massaggio classico, 664. Riflessoterapia generale
</td><td style="display: none;">
</td><td style="display: none;">
ABBONDANZIERI Katia
</td><td style="display: none;">
</td><td style="display: none;">
Place du Cirque, 2
</td><td style="display: none;">
1204
</td><td style="display: none;">
Genève
</td><td style="display: none;">
022 328 23 44
</td><td style="display: none;">
079 601 92 75
</td><td style="display: none;">
</td><td style="display: none;">
</td><td style="display: none;">
</td><td style="display: none;">
</td><td style="display: none;">
<div class="thZone"><div class="zCat">METHODES DE MASSAGE</div><div class="zThr">Linfodrenaggio manuale</div><div class="zThr">Massaggio classico</div><div class="zCat">METHODES PRESCRIPTIVES</div><div class="zThr">Omeopatia</div><div class="zCat">METHODES REFLEXES</div><div class="zThr">Riflessoterapia generale</div></div>
</td>
</tr><tr class="footable-row-detail" style="display: table-row;"><td class="footable-row-detail-cell" colspan="7"><div class="footable-row-detail-inner"><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value">197. Omeopatia, 202. Linfodrenaggio manuale, 205. Massaggio classico, 664. Riflessoterapia generale</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cognome:</div><div class="footable-row-detail-value">ABBONDANZIERI Katia</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Via:</div><div class="footable-row-detail-value">Place du Cirque, 2</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">NPA:</div><div class="footable-row-detail-value">1204</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Luogo:</div><div class="footable-row-detail-value">Genève</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Tel / Cellulare:</div><div class="footable-row-detail-value">022 328 23 44</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cellulare:</div><div class="footable-row-detail-value">079 601 92 75</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value"><div class="thZone"><div class="zCat">METHODES DE MASSAGE</div><div class="zThr">Linfodrenaggio manuale</div><div class="zThr">Massaggio classico</div><div class="zCat">METHODES PRESCRIPTIVES</div><div class="zThr">Omeopatia</div><div class="zCat">METHODES REFLEXES</div><div class="zThr">Riflessoterapia generale</div></div></div></div></div></td></tr><tr class="row_grey footable-detail-show">
<td class="footable-visible footable-first-column"><span class="footable-toggle"></span> </td><td class="footable-visible">
<a href="http://www.kinesiopourtous.ch" target="_blank">
<img title="Link internet" alt="" style="MARGIN-RIGHT: 7px" src="Images/pictoSiteInternet.jpg" width="12" height="12" border="0">
</a>
</td><td class="footable-visible"> </td><td class="footable-visible">
<img id="MainContent_gw_partners_img1_1" src="Images/multi.gif">
ABEGG Sophie
</td><td class="footable-visible">
1212
<br>
1875<br>
</td><td class="footable-visible">
Grand-Lancy
<br>
<nobr>Morgins</nobr><nobr><br>
</nobr></td><td class="footable-visible footable-last-column">
GE
<br>
VS<br>
</td><td style="display: none;">
199. Kinesiologia
</td><td style="display: none;">
Kinéso pour tous
</td><td style="display: none;">
ABEGG Sophie
</td><td style="display: none;">
</td><td style="display: none;">
Rue du Bachet 8
</td><td style="display: none;">
1212
</td><td style="display: none;">
Grand-Lancy
</td><td style="display: none;">
</td><td style="display: none;">
076 365 63 86
</td><td style="display: none;">
</td><td style="display: none;">
<a href="mailto:sophie@kinesiopourtous.ch">sophie[at]kinesiopourtous.ch
</a>
</td><td style="display: none;">
<a href="http://www.kinesiopourtous.ch" target="_blank">
www.kinesiopourtous.ch
</a>
</td><td style="display: none;">
Résidence Bellevue, Rte de France 22, 1875 Morgins, CH<br>
</td><td style="display: none;">
<div class="thZone"><div class="zCat">METHODES ENERGETIQUES MANUELLES</div><div class="zThr">Kinesiologia</div></div>
</td>
</tr><tr class="footable-row-detail"><td class="footable-row-detail-cell" colspan="7"><div class="footable-row-detail-inner"><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value">199. Kinesiologia</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Società:</div><div class="footable-row-detail-value">Kinéso pour tous</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cognome:</div><div class="footable-row-detail-value">ABEGG Sophie</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Via:</div><div class="footable-row-detail-value">Rue du Bachet 8</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">NPA:</div><div class="footable-row-detail-value">1212</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Luogo:</div><div class="footable-row-detail-value">Grand-Lancy</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Cellulare:</div><div class="footable-row-detail-value">076 365 63 86</div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">e-mail:</div><div class="footable-row-detail-value"><a href="mailto:sophie@kinesiopourtous.ch">sophie[at]kinesiopourtous.ch
</a></div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Sito WEB:</div><div class="footable-row-detail-value"><a href="http://www.kinesiopourtous.ch" target="_blank">
www.kinesiopourtous.ch
</a></div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Altri luoghi di lavoro:</div><div class="footable-row-detail-value">Résidence Bellevue, Rte de France 22, 1875 Morgins, CH<br></div></div><div class="footable-row-detail-row"><div class="footable-row-detail-name">Discipline(s) thérapeutique(s):</div><div class="footable-row-detail-value"><div class="thZone"><div class="zCat">METHODES ENERGETIQUES MANUELLES</div><div class="zThr">Kinesiologia</div></div></div></div></div></td></tr><tr class="row_white">
<td class="footable-visible footable-first-column"><span class="footable-toggle"></span> </td><td class="footable-visible">
因此,我正在使用Selenium单击,BeautifulSoup 4用于刮擦表格。
我想创建一个循环来单击每个箭头(每页15个箭头)并从每个表中抓取数据(每个表中13行。如果缺少数据,则该单元格应在输出的excel文件中为空白)。
有什么帮助吗?
答案 0 :(得分:0)
如果您进行检查,则可以看到它的“请求方法:POST”,因此使用了另一种方法。
如果您仍然希望使用硒,请告诉我,我也可以尝试这种方法。
您将需要获取表格数据,并将其复制到有效负载字典中。我没有包含整个内容,因为它太长了,但是我在代码中包含了一个小片段,以便您可以看到格式。
然后我只是用熊猫来获取数据表。
Range.Value
输出:
Sub Test()
Dim ws As Worksheet
Dim c As Range
Set ws = ActiveSheet
If WorksheetFunction.CountA(ws.Columns(2)) > 0 Then
Set c = ws.Columns(2).Find( _
What:="Total WI Expenses", _
After:=ws.Cells(1, 2), _
SearchOrder:=xlByRows, _
SearchDirection:=xlNext)
If Not c Is Nothing Then
ws.Rows(c.Row + 4).Value = ws.Rows(c.Row).Value
End If
Set c = Nothing
End If
Set ws = Nothing
End Sub
答案 1 :(得分:0)
硒扩展那些表的方法。有一种更好的方法来处理需要加载的领带,但只是想尽快将其拿给您,因此只需使用time.sleep
from selenium import webdriver
import time
url = 'http://www.asca.ch/Partners.aspx?lang=it'
driver = webdriver.Chrome()
driver.get(url)
# Click the dropdown, select GE, click Confermo, click Ricerca
driver.find_element_by_xpath('//*[@id="ctl00_MainContent_ddl_cantons_Arrow"]').click()
time.sleep(2)
driver.find_element_by_xpath('//*[@id="ctl00_MainContent_ddl_cantons_DropDown"]/div/ul/li[9]').click()
driver.find_element_by_xpath('//*[@id="MainContent__chkDisclaimer"]').click()
driver.find_element_by_xpath('//*[@id="MainContent_btn_submit"]').click()
time.sleep(5)
#Function to Expand Tables
def expand_tables():
rows = driver.find_elements_by_xpath('//*[@id="MainContent_gw_partners"]/tbody/tr')
for row in rows:
row.click()
# Function to Click Next Page
def click_next_page():
driver.find_element_by_xpath('//*[@id="MainContent_btnNextPackId"]').click()
page = 1
num_of_pages = True
while num_of_pages == True:
print ('Page: %s' %page)
expand_tables()
## Your code to Parse the Tables ##
try:
click_next_page()
page += 1
except:
print ('You are at the end')
time.sleep(2)
# When finished
driver.close()
答案 2 :(得分:0)
对不起,我的代码无法放入注释中,因此我将其作为答案发布。
这是我解析表的代码:
# To find all the tables
table = soup.find('table', {'class': 'footable'})
# To get all rows in that table
rows = table.find_all('tr')
# A function to process each row
def processRow(row):
#All rows with hidden data
dataFields = row.find_all('td', {'style': True}
output = {}
#Fixed index numbers are not ideal but in this case will work
output['Discipline'] = dataFields[0].text
output['Cogome'] = dataFields[2].text
output['Cellulare'] = dataFields[8].text
output['email'] = dataFields[10].text
return output
# Declaring a list to store all results
results = []
# Iterating over all the rows and storing the processed result in a list
for row in rows:
results.append(processRow(row))
print(results)
click_next_page()
time.sleep(3)
count += 1
我认为有些问题。我在下面的“输出= {}”处得到“语法错误:语法无效”#一个用于处理每一行的函数。