Question

我想从网站中提取一些值，具体元素是

<div class="float_l dcMsg">
    <div class="float_l" style="margin-right: 5px; min-width: 105px;">Slow Stochastic(20,5)</div>
    <div class="float_l ind-color-box" style="margin-right: 5px; background: rgb(242, 38, 31);"></div>
    <div class="float_l" style="margin-right: 5px; min-width: 105px;">%K: 33.996</div>
    <div class="float_l ind-color-box" style="margin-right: 5px; background: rgb(0, 255, 0);"></div>
    <div class="float_l" style="margin-right: 5px; min-width: 105px;">%D: 18.393</div>
</div>

我想要的值是在第4行（即33.996）和第6行（即18.393）。

我想要的这些数字实际上来自动态图表，但我不知道它是否来自javascript。在我按下网页上的某个按钮后，数字将更新为最新值，并且元素中的数字值也会相应地改变。此外，当我将光标悬停在图表上时，数字会改变。

但是，我不会重新加载网页，但按下按钮后，只会更改页面元素编号的一部分。

我尝试过这段代码，但它会返回[]。

import urllib
import re

htmltext = urllib.urlopen("http://www.example.com").read()

regex = '<div class="float_l" style="margin-right: 5px; min-width: 105px;">(.+?)</div>'

pattern = re.compile(regex)

results = re.findall(pattern,htmltext)

print results

我也尝试过使用BeautifulSoup，但它也会返回[]。

import bs4 as bs
import urllib

sauce = urllib.urlopen('http://www.example.com').read()

soup = bs.BeautifulSoup(sauce,'html.parser')

results = soup.findAll('div',style='margin-right: 5px; min-width: 105px;')

print results

Answer 1

硒可能是一个很好的组合，但它可行。

也许是这样的：

In [30]: for el in soup.findAll('div'):
    ...:     if el.has_attr('style') and 'margin-right: 5px' in el.attrs['style'] and el.attrs['class'] == ['float_l']:
    ...:         print el
    ...:
    ...:
<div class="float_l" style="margin-right: 5px; min-width: 105px;">Slow Stochastic(20,5)</div>
<div class="float_l" style="margin-right: 5px; min-width: 105px;">%K: 33.996</div>
<div class="float_l" style="margin-right: 5px; min-width: 105px;">%D: 18.393</div>

使用Python web-scraping获取空返回

1 个答案: