Question

我正在尝试使用nokogiri和curb从网站上抓取信息，但我似乎无法找到正确的名称/标题来找出在哪里刮（我正在尝试刮掉api密钥，这是在html代码的底部为“xxxxxxx”）甚至如何，请帮助，谢谢。：）

html代码如下：

<body class="html not-front logged-in no-sidebars page-app page-app- page-app-8383900 page-app-keys i18n-en" data-twttr-rendered="true">

<div id="skip-link"></div>
<div id="page-wrapper">
    <!--

     Code for the global nav 

    -->
    <nav id="globalnav" class="without-subnav"></nav>
    <nav id="subnav"></nav>
    <section id="hero" class="hero-short"></section>

<div class="container">
    ::before
    <div id="messages"></div>
    <div id="gaz-content-wrap-outer" class="row">
        ::before
        <div id="gaz-content-wrap-inner" class="span12">
            <div class="row">
                ::before
                <div class="article-wrap span12">
                    <article id="gaz-content-body" class="content">
                        <header></header>
                        <div class="header-action"></div>
                        <div class="tabs"></div>

lass =“d-block d-block-system g-main”＆gt;

<div class="app-details">
    <h2>

        Application Settings

    </h2>
    <div class="description"></div>
    <div class="app-settings">
        <div class="row">
            ::before
            <span class="heading">

                Consumer Key (API Key)

            </span>
            <span>

                xxxxxxxxx

            </span>

我所能得到的只是“内容”文本。

我的代码如下：

consumer = html.at("#gaz-content-body")['class']
puts consumer

我不确定要键入什么来选择类和/或输入文本。所有我能得到的是nokogiri放“内容”。

Answer 1

在这种情况下，我们需要在课程后面找到第二个跨度＆＃39;标题＆＃39; ，以及在课堂内使用课程设置＆＃39; （只是有点一般，但不是太多）。我使用.search代替.at来检索两个跨度并获得第二个跨度。

# Gets the 2 span elements under <div class='app-settings'>.
res = html.search('#gaz-content-body .app-settings span')

# Use .text to get the contents of the 2nd element.
res[1].text.strip
# => "xxxxxxxx"

但您也可以使用.at将目标定位于：

res = html.at("#gaz-content-body .app-settings span:nth-child(2)")
res.text.strip
# => "xxxxxxxx"

我可以让Nokogiri从Ruby中删除文本吗？

1 个答案: