I'm trying to scrape data off a table on a web page using Python, BeautifulSoup, Requests, as well as Selenium to log into the site. Here's the table I'm looking to get data for...
<div class="sastrupp-class">
<table>
<tbody>
<tr>
<td class="key">Thing I dont want 1</td>
<td class="value money">$1.23</td>
<td class="key">Thing I dont want 2</td>
<td class="value">99,999,999</td>
<td class="key">Target</td>
<td class="money value">$1.23</td>
<td class="key">Thing I dont want 3</td>
<td class="money value">$1.23</td>
<td class="key">Thing I dont want 4</td>
<td class="value percentage">1.23%</td>
<td class="key">Thing I dont want 5</td>
<td class="money value">$1.23</td>
</tr>
</tbody>
</table>
</div>
output = soup.find('td', {'class':'key'})
print(output)
but that doesn't return anything.
Important to note:
2.There are other < div>s with class="sastrupp-class" on the site.
答案 0 :(得分:-1)
1)首先,要获得“目标”,您需要 find_all ,而不是 find 。然后,考虑到你确切知道你的目标将在哪个位置(在你给它的例子中是index = 2),可以像这样得到解决方案:
from bs4 import BeautifulSoup
html = """(YOUR HTML)"""
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('div', {'class': 'sastrupp-class'})
all_keys = table.find_all('td', {'class': 'key'})
my_key = all_keys[2]
print my_key.text # prints 'Target'
2)
还有其他&lt; div&gt; s在网站上有class =“sastrupp-class”
同样,您需要使用 find_all 选择所需的那个,然后选择正确的索引。
示例HTML:
<body>
<div class="sastrupp-class"> Don't need this</div>
<div class="sastrupp-class"> Don't need this</div>
<div class="sastrupp-class"> Don't need this</div>
<div class="sastrupp-class"> Target</div>
</body>
要提取目标,您可以:
all_divs = soup.find_all('div', {'class':'sastrupp-class'})
target = all_divs[3] # assuming you know exactly which index to look for