Question

我使用拆分页面方法来提取网站的某些部分。现在我想从第二个代码中提取5,217。我一直在使用第一种方法从网站上提取代码：

def idNotation（x）： request = urllib2.Request（“网站网址”） handle = urllib2.urlopen（request） content = handle.read（） splitted_page = content.split（“拆分前的部分”） splitted_page = splitted_page [1] .split（“拆分后的部分”）
value =  splitted_page[0].replace(",",".")
value = value.replace(",",".")
return value

对于以下代码，此方法不起作用：

    <tr>
                <td class="bold">
                    Hebel                        <a class="popup icon info right" href="/de/boersenportal/tools-und-services/glossar/glossar/?glossar_word=hebel"></a>
                </td>
                <td class="nowrap last">5,217</td>
            </tr>

因为：

td class =“nowrap last”

在源代码中多次出现。我必须在拆分页面的第一部分中包含以下代码才能接收所需的部分。问题是空格，因为如果包含许多不同的代码行，我使用的拆分页面方法不起作用。

我正在寻找一种仅提取5,217

的方法

    Hebel                        <a class="popup icon info right" href="/de/boersenportal/tools-und-services/glossar/glossar/?glossar_word=hebel"></a>
                </td>
                <td class="nowrap last">

Answer 1

我怀疑您正在解析类似于https://www.boerse-stuttgart.de/de/boersenportal/wertpapiere-und-maerkte/anlageprodukte/factsheet/?ID_NOTATION=154844044的网页，您希望信息来自标有＆＃39; Kennzahlen＆＃39;的页面部分。并且您对“Hebel”这个词右侧的数字特别感兴趣。在这样的页面中。

此代码将为您提供。当然，在你习惯它之前它是可怕的。

>>> import requests
>>> page = requests.get('https://www.boerse-stuttgart.de/de/boersenportal/wertpapiere-und-maerkte/anlageprodukte/factsheet/?ID_NOTATION=154844044').content
>>> from scrapy.selector import Selector
>>> scrapings = Selector(text=page)
>>> scrapings.xpath('.//td[contains(text(),"Hebel")]/following-sibling::td[@class="nowrap last"]/text()').extract()
['5,084']

Selector进程page，以便可以使用数百个 xpath 表达式中的任意一个来标识页面的元素。在这种情况下，粗略地说：

.//：查看页面中的所有位置。
td[contains(text(),"Hebel")]：找到一个td元素，其文字包含＆＃39; Hebel＆＃39;。我没有说'等于＆＃39;因为所需的字符串被空格包围。
following-sibling：考虑跟随它的td的兄弟姐妹。
::td[@class="nowrap last"]：将对这些兄弟姐妹的考虑限制为td，class就是这样。
text()：获取td元素的文字。
.extract()：创建Python可用的xpath所发现的内容。在这种情况下，它是一个列表。通常就是这种情况。

http://ricostacruz.com/cheatsheets/xpath.html非常方便。

如果特定代码路径包含更多行，则拆分页面

1 个答案: