使用Beautiful Soup检索数据

时间:2016-08-24 15:20:41

标签: python beautifulsoup

所以我一直在尝试使用BeautifulSoup检索一些数据,但我碰到了一堵砖墙。

<tr data-name="A Color Similar to Slate">
                <th class="unique"><a href="/item/5052/6/223d382afee2ac6857d3298b800652e0" class="item-link"><span style='color: #7D6D00'>A Color Similar to Slate</span></a></th>
                <td class=unique>0/10</td>
                <td class="unique" data-conversion="14 ref">35,000</td>
                <td class="unique" data-conversion="13.02 ref">32,550</td>
                <td class="unique" data-conversion="13.51 ref">33,775</td>
                <td class="unique" style="text-align: center;"><a class="item-link-backpack" href="http://backpack.tf/stats/Unique/A+Color+Similar+to+Slate/tradable/craftable"><img src="/img/bptf-icon.png" alt="View on Backpack.tf"/></a></td>
            </tr>

我希望我的脚本要做的是输入一个输入(在这种情况下是“与Slate类似的颜色”字符串)并让它返回下面的数据(0 / 10,14 ref等)以便我可以将它与不同的数据集进行比较。我怎样才能使它发挥作用?

2 个答案:

答案 0 :(得分:1)

similar_color = soup.find('tr', {'data-name': 'A Color Similar to Slate'})
for value in similar_color.find_all('td'):
    print(value.text)

应该导致:

0/10
35,000
等等,等等。但是,您似乎有时想要获取文本值,而有时会获取data-conversion值。为此,您只需将print(value.text)行替换为:

print(value.attrs.get('data-conversion'))

答案 1 :(得分:0)

如果您将其用于其他HTML样式文件:

from bs4 import BeautifulSoup
html= """<tr data-name="A Color Similar to Slate">
                <th class="unique"><a href="/item/5052/6/223d382afee2ac6857d3298b800652e0" class="item-link"><span style='color: #7D6D00'>A Color Similar to Slate</span></a></th>
                <td class=unique>0/10</td>
                <td class="unique" data-conversion="14 ref">35,000</td>
                <td class="unique" data-conversion="13.02 ref">32,550</td>
                <td class="unique" data-conversion="13.51 ref">33,775</td>
                <td class="unique" style="text-align: center;"><a class="item-link-backpack" href="http://backpack.tf/stats/Unique/A+Color+Similar+to+Slate/tradable/craftable"><img src="/img/bptf-icon.png" alt="View on Backpack.tf"/></a></td>
            </tr>"""

soup = BeautifulSoup(html)
texts = [i.get_text() for i in soup.find_all() if i.get_text()]

print(texts[texts.index('A Color Similar to Slate'):])

这将检查所有标签,而不只是td。输出为['A Color Similar to Slate', 'A Color Similar to Slate', 'A Color Similar to Slate', '0/10', '35,000', '32,550', '33,775']