Question

I'm trying to scrape data off a table on a web page using Python, BeautifulSoup, Requests, as well as Selenium to log into the site. Here's the table I'm looking to get data for...

<div class="sastrupp-class">
        <table>
            <tbody>
                <tr>
                    <td class="key">Thing I dont want 1</td>
                    <td class="value money">$1.23</td>

                    <td class="key">Thing I dont want 2</td>
                    <td class="value">99,999,999</td>

                    <td class="key">Target</td>
                    <td class="money value">$1.23</td>

                    <td class="key">Thing I dont want 3</td>
                    <td class="money value">$1.23</td>

                    <td class="key">Thing I dont want 4</td>
                    <td class="value percentage">1.23%</td>

                    <td class="key">Thing I dont want 5</td>
                    <td class="money value">$1.23</td>
                </tr>
            </tbody>
        </table>
    </div>

I can find the "sastrupp-class" fine, but I don't know how to look through it and get to the part of the table I want. I figured I could just look for the class that I'm searching for like this...

    output = soup.find('td', {'class':'key'})
    print(output)

but that doesn't return anything.

Important to note:

< td>s inside the table have the same class name as the one that I want. If I can't separate them out, I'm ok with that although I'd rather just return the one I want.

2.There are other < div>s with class="sastrupp-class" on the site.

I'm obviously a beginner at this so let me know if I can help you help me. Any help/pointers would be appreciated.

Answer 1

1）首先，要获得“目标”，您需要 find_all ，而不是 find 。然后，考虑到你确切知道你的目标将在哪个位置（在你给它的例子中是index = 2），可以像这样得到解决方案：

from bs4 import BeautifulSoup

html = """(YOUR HTML)"""

soup = BeautifulSoup(html, 'html.parser')
table = soup.find('div', {'class': 'sastrupp-class'})
all_keys = table.find_all('td', {'class': 'key'})
my_key = all_keys[2]

print my_key.text  # prints 'Target'

2）

还有其他＆lt; div＆gt; s在网站上有class =“sastrupp-class”

同样，您需要使用 find_all 选择所需的那个，然后选择正确的索引。

示例HTML：

<body>
<div class="sastrupp-class"> Don't need this</div>
<div class="sastrupp-class"> Don't need this</div>
<div class="sastrupp-class"> Don't need this</div>
<div class="sastrupp-class"> Target</div>
</body>

要提取目标，您可以：

all_divs = soup.find_all('div', {'class':'sastrupp-class'})
target = all_divs[3]  # assuming you know exactly which index to look for

Can't scrape HTML table using BeautifulSoup

1 个答案:

Can&#39;t scrape HTML table using BeautifulSoup

1 个答案:

Can't scrape HTML table using BeautifulSoup