如何在没有标识符的表中查找值? (Python,硒)

时间:2019-03-27 07:08:52

标签: python selenium selenium-webdriver

我有一个网页,其中有很多行的表格。用户会给我一个数字(15308),该数字可以在顶行找到并且带有第一个<td>标签,这是我唯一的信息。我希望能够使用此数字在<th></th>标记(更具体地说是0)之间查找数据,但仅用于表行。例如,我附加了两个表行,并且我希望使用数字15308来存储<th>数据,而不是在其第一个<th>中具有数字15309的表行中的<td>数据。感谢您的帮助!
期望的输出:0

<tr>
<td><a href="http://sdb.admin.uw.edu/timeschd/UWNetID/sln.asp?QTRYR=SPR+2019&amp;SLN=15308">15308</a></td>
<td nowrap="">INFO   101  </td>
<td>A </td>
<td align="CENTER">LC</td>
<td>SOCIAL NETWORKING   </td>
<td align="CENTER"> 150</td>
<td align="CENTER"> 150</td>
<td align="CENTER"> 250</td>
<th align="CENTER">  0</th><td align="CENTER"> 229</td>
<td></td>
</tr>
<tr><td><a href="http://sdb.admin.uw.edu/timeschd/UWNetID/sln.asp?QTRYR=SPR+2019&amp;SLN=15309">15309</a></td>
<td nowrap="">INFO   101  </td>
<td>AA</td>
<td align="CENTER">LB</td>
<td>SOCIAL NETWORKING   </td>
<td align="CENTER">  25</td>
<td align="CENTER">  25</td>
<td align="CENTER">  26</td>
<th align="CENTER" style="">  2</th><td align="CENTER">  21</td>
<td></td>
</tr>

2 个答案:

答案 0 :(得分:1)

使用以下代码:

userValue='15308'
all_td_th_of_row = driver.find_elements_by_xpath("//td[normalize-space()='" + userValue + "']//following-sibling::td|th")
i = 0
while i<len(all_td_th_of_row) : 
    print(all_td_th_of_row[i].text)
    i=i+1

答案 1 :(得分:0)

使用beauitfulsoup,我一直发现自己很美:

使用xpath="1"作为属性:

line = '''<tr><td><a href="http://sdb.admin.uw.edu/timeschd/UWNetID/sln.asp?QTRYR=SPR+2019&amp;SLN=15308" style="">15308</a></td>
<td nowrap="">INFO   101  </td>
<td>A </td>
<td align="CENTER">LC</td>
<td>SOCIAL NETWORKING   </td>
<td align="CENTER"> 150</td>
<td align="CENTER"> 150</td>
<td align="CENTER"> 250</td>
<th align="CENTER" style="" xpath="1">  0</th><td align="CENTER"> 229</td>
<td></td>
</tr>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(line, 'html.parser')
xpathTh = soup.find('th',  attrs={'xpath': '1'})
print(xpathTh.text.strip())

输出

0

编辑

要从属性中获取所有值:

line = '''<tr><td><a href="http://sdb.admin.uw.edu/timeschd/UWNetID/sln.asp?QTRYR=SPR+2019&amp;SLN=15308" style="">15308</a></td>
<td nowrap="">INFO   101  </td>
<td>A </td>
<td align="CENTER">LC</td>
<td>SOCIAL NETWORKING   </td>
<td align="CENTER"> 150</td>
<td align="CENTER"> 150</td>
<td align="CENTER"> 250</td>
<th align="CENTER" style="" xpath="1">  0</th><td align="CENTER"> 229</td>
<th align="CENTER" style="" xpath="1">  1</th><td align="CENTER"> 229</td>
<th align="CENTER" style="" xpath="1">  2</th><td align="CENTER"> 229</td>
<td></td>
</tr>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(line, 'html.parser')
xpathTh = soup.find_all('th',  attrs={'xpath': '1'})

for elem in xpathTh:
    print(elem.text.strip())

输出

0
1
2

编辑2

考虑到您仅希望xpath value内部(anchor tag内部的td的值为tr的{​​{1}}:

15308

输出

line = '''<tr><td><a href="http://sdb.admin.uw.edu/timeschd/UWNetID/sln.asp?QTRYR=SPR+2019&amp;SLN=15308" style="">15308</a></td>
<td nowrap="">INFO   101  </td>
<td>A </td>
<td align="CENTER">LC</td>
<td>SOCIAL NETWORKING   </td>
<td align="CENTER"> 150</td>
<td align="CENTER"> 150</td>
<td align="CENTER"> 250</td>
<th align="CENTER" style="" xpath="1">  0</th><td align="CENTER"> 229</td>
<td></td>
</tr>
<tr><td><a href="http://sdb.admin.uw.edu/timeschd/UWNetID/sln.asp?QTRYR=SPR+2019&amp;SLN=2222" style="">22222</a></td>
<td nowrap="">INFO   101  </td>
<td>A </td>
<td align="CENTER">LC</td>
<td>SOCIAL NETWORKING   </td>
<td align="CENTER"> 150</td>
<td align="CENTER"> 150</td>
<td align="CENTER"> 250</td>
<th align="CENTER" style="" xpath="1">  1</th><td align="CENTER"> 229</td>
<td></td>
</tr>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(line, 'html.parser')

trElems = soup.find_all('tr')
toFind = '15308'

for tr in trElems:
    val = tr.select('td a')[0].text
    if toFind == val:
        xpathTh = tr.find_all('th', attrs={'xpath': '1'})
        for elem in xpathTh:
            print(elem.text.strip())

编辑3

继续评论:

0

输出

line = '''<tr>
<td><a href="http://sdb.admin.uw.edu/timeschd/UWNetID/sln.asp?QTRYR=SPR+2019&amp;SLN=15308">15308</a></td>
<td nowrap="">INFO   101  </td>
<td>A </td>
<td align="CENTER">LC</td>
<td>SOCIAL NETWORKING   </td>
<td align="CENTER"> 150</td>
<td align="CENTER"> 150</td>
<td align="CENTER"> 250</td>
<th align="CENTER">  0</th><td align="CENTER"> 229</td>
<td></td>
</tr>
<tr><td><a href="http://sdb.admin.uw.edu/timeschd/UWNetID/sln.asp?QTRYR=SPR+2019&amp;SLN=15309">15309</a></td>
<td nowrap="">INFO   101  </td>
<td>AA</td>
<td align="CENTER">LB</td>
<td>SOCIAL NETWORKING   </td>
<td align="CENTER">  25</td>
<td align="CENTER">  25</td>
<td align="CENTER">  26</td>
<th align="CENTER" style="">  2</th><td align="CENTER">  21</td>
<td></td>
</tr>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(line, 'html.parser')

trElems = soup.find_all('tr')
toFind = '15308'

for tr in trElems:
    val = tr.select('td a')[0].text
    if toFind == val:
        xpathTh = tr.find_all('td')[7]
        print("For the value: {}, The result is {}".format(toFind, xpathTh.find_next('th').text.strip()))