如何在每两个同级<hr>标签之间抓取内容?

时间:2019-08-12 05:21:25

标签: python html css web-scraping css-selectors

很难描述我的真实情况,所以我直接打开网站: https://www.w3schools.com/php/php_intro.asp

以下元素非常长,您可以快速扫描它。当您打开链接时,您会发现每个内容块都将被上下两行(hr标签)框起来,所以我的目的是在两个hr标签之间刮擦每个块内容

(实际上,困难在于标签数量不确定以及每两个hr标签之间的变幻无常的结构)

如何实现?

<div class="w3-col l10 m12" id="main">
      <div id="mainLeaderboard" style="overflow:hidden;">
        <!-- MainLeaderboard-->

        <!--<pre>main_leaderboard, all: [728,90][970,90][320,50][468,60]</pre>-->
        <div id="snhb-main_leaderboard-0" data-google-query-id="CJmd77_F_OMCFUSJwgodAWAIsg"><div id="google_ads_iframe_/22152718/sws-hb//w3schools.com//main_leaderboard_0__container__" style="border: 0pt none;"><iframe id="google_ads_iframe_/22152718/sws-hb//w3schools.com//main_leaderboard_0" title="3rd party ad content" name="google_ads_iframe_/22152718/sws-hb//w3schools.com//main_leaderboard_0" width="468" height="60" scrolling="no" marginwidth="0" marginheight="0" frameborder="0" srcdoc="" style="border: 0px; vertical-align: bottom;" data-google-container-id="d" data-load-complete="true"></iframe></div></div>
        <!-- adspace leaderboard -->

      </div>
<h1>Python <span class="color_h1">Tutorial</span></h1>
<div class="w3-clear nextprev">
<a class="w3-left w3-btn" href="/default.asp">❮ Home</a>
<a class="w3-right w3-btn" href="python_intro.asp">Next ❯</a>
</div>

<div class="w3-panel w3-info intro">
<p>Python is a programming language.</p>
<p>Python can be used on a server to create web applications.</p>
<a class="w3-btn w3-margin-bottom" href="python_intro.asp">Start learning Python now »</a>
</div>

<hr>

<h2>Learning by Examples</h2>
<p>Our "Show Python" tool makes it easy to learn Python, it shows both the 
code and the result.</p>

<div class="w3-example">
<h3>Example</h3>
<div class="w3-code notranslate pythonHigh"><span class="pythoncolor" style="color:black">
<span class="pythonkeywordcolor" style="color:mediumblue">print</span>(<span class="pythonstringcolor" style="color:brown">"Hello, World!"</span>)<span class="pythonnumbercolor" style="color:red">
</span> </span></div>
<a target="_blank" class="w3-btn w3-margin-bottom" href="showpython.asp?filename=demo_default">Run example »</a>
</div>

<p><b>Click on the "Run example" button to see how it works.</b></p>
<hr>

<h2>Python File Handling</h2>
<p>In our File Handling section you will learn how to open, read, write, and 
delete files.</p>
<p><a href="python_file_handling.asp">Python File Handling</a></p>

<hr>

<h2>Python Database Handling</h2>
<p>In our database section you will learn how to access and work with MySQL and MongoDB databases:</p>
<p><a href="python_mysql_getstarted.asp">Python MySQL Tutorial</a></p>

<p><a href="python_mongodb_getstarted.asp">Python MongoDB Tutorial</a></p>

<hr>

<h2>Python Exercises</h2>
<form autocomplete="off" id="w3-exerciseform" action="exercise.asp?filename=exercise_syntax1" method="post" target="_blank">
<h2>Test Yourself With Exercises</h2>
<div class="exercisewindow">
<h2>Exercise:</h2>
<p>Insert the missing part of the code below to output "Hello World".</p>
<div class="exerciseprecontainer">
<pre><input name="ex1" maxlength="5" style="width: 54px;">("Hello World")
</pre>
</div>
<br>
<button type="submit" class="w3-btn w3-margin-bottom">Submit Answer »</button>
<p><a target="_blank" href="exercise.asp?filename=exercise_syntax1">Start the Exercise</a></p>
</div>
</form>

<hr>
<div id="midcontentadcontainer" style="overflow:auto;text-align:center">
<!-- MidContent -->

  <!--<pre>mid_content, all: [300,250][336,280][728,90][970,250][970,90][320,50][468,60]</pre>-->
  <div id="snhb-mid_content-0" data-google-query-id="CNqS8r_F_OMCFUSJwgodAWAIsg"><div id="google_ads_iframe_/22152718/sws-hb//w3schools.com//mid_content_0__container__" style="border: 0pt none;"><iframe id="google_ads_iframe_/22152718/sws-hb//w3schools.com//mid_content_0" title="3rd party ad content" name="google_ads_iframe_/22152718/sws-hb//w3schools.com//mid_content_0" width="336" height="280" scrolling="no" marginwidth="0" marginheight="0" frameborder="0" srcdoc="" style="border: 0px; vertical-align: bottom;" data-google-container-id="f" data-load-complete="true"></iframe></div></div>

</div>
<hr>

<h2>Python Examples</h2>
<p>Learn by examples! This tutorial supplements all explanations with clarifying examples.</p>
<p><a href="python_examples.asp" class="w3-button w3-light-grey">See All Python Examples</a></p>
<hr>

<h2>Python Quiz</h2>
<p>Learn by taking a quiz! This quiz will give you a signal of how much you know, or do not know, about Python.</p>
<p><a href="python_quiz.asp" class="w3-btn w3-blue">Python Quiz</a></p>
<hr>


<h2>Python Reference</h2>
<p>You will also find complete function and method references:</p>
<p><a href="python_reference.asp">Reference Overview</a></p>
<p><a href="python_ref_functions.asp">Built-in Functions</a></p>
<p><a href="python_ref_string.asp">String Methods</a></p>
<p><a href="python_ref_list.asp">List/Array Methods</a></p>
<p><a href="python_ref_dictionary.asp">Dictionary Methods</a></p>
<p><a href="python_ref_tuple.asp">Tuple Methods</a></p>
<p><a href="python_ref_set.asp">Set Methods</a></p>
<p><a href="python_ref_file.asp">File Methods</a></p>
<p><a href="python_ref_keywords.asp">Python Keywords</a></p>
<hr>
<h2>Download Python</h2>
<p>Download Python from the official Python web site:
  <a target="_blank" href="https://python.org/">https://python.org</a></p>
<hr>

<h2>Python Exam - Get Your Diploma!</h2>
<div class="w3-row">
<div class="w3-third w3-container w3-padding-24"><a href="/cert/default.asp"><img src="/images/w3certified_logo_250.png" style="max-width:100%;" alt="W3Schools Certification"></a> </div>
<div class="w3-twothird w3-container"><h2>W3Schools' Online Certification</h2>
<p>The perfect solution for professionals who need to balance work, family, and career building.</p>
<p>More than 25 000 certificates already issued!</p>
</div>
</div>
<p><a class="w3-btn" href="/cert/default.asp">Get Your Certificate »</a></p>
<p style="clear:both;">The <a href="/cert/default.asp">HTML Certificate</a> documents your knowledge of HTML.</p>
<p>The <a href="/cert/default.asp">CSS Certificate</a> documents your knowledge of advanced CSS.</p>
<p>The <a href="/cert/default.asp">JavaScript Certificate</a> documents your knowledge of JavaScript and HTML DOM.</p>
<p>The <a href="/cert/default.asp">Python Certificate</a> documents your knowledge of Python.</p>
<p>The <a href="/cert/default.asp">jQuery Certificate</a> documents your knowledge of jQuery.</p>
<p>The <a href="/cert/default.asp">SQL Certificate</a> documents your knowledge of SQL.</p>
<p>The <a href="/cert/default.asp">PHP Certificate</a> documents your knowledge of PHP and MySQL.</p>
<p>The <a href="/cert/default.asp">XML Certificate</a> documents your knowledge of XML, XML DOM and XSLT.</p>
<p>The <a href="/cert/default.asp">Bootstrap Certificate</a> documents your knowledge of the Bootstrap framework.</p>


<div class="w3-clear nextprev">
<a class="w3-left w3-btn" href="/default.asp">❮ Home</a>
<a class="w3-right w3-btn" href="python_intro.asp">Next ❯</a>
</div>
</div>
```**strong text**

1 个答案:

答案 0 :(得分:0)

我不知道我是否明白这一点,但是如果您只想调整内容,则只能使用CSS来做到这一点,您可以在“ Div Blocks”中组织您的内容,并为每个类别设置相同的类而不是用hr,只需将像这样的边框放在底部

import React from "react";
import ReactDOM from "react-dom";

class App extends React.Component {
  constructor(props) {
    super(props);
    this.today = new Date().getDate()
    this.state = {
      dates: [
        {
          id: 0,
          text: (this.today - 2)
        },
        {
          id: 1,
          text: (this.today - 1)
        },        
        {
          id: 2,
          text: this.today
        },        
        {
          id: 3,
          text: (this.today + 1)
        },        
        {
          id: 4,
          text: (this.today + 2)
        }    
      ]
    }
  }

  render () {
    return (
      <div className="App">
        {this.state.dates.map((date) => {
          return (<button key={date.id}>{date.text.toLocaleString()}</button>)
        })} 
      </div>
    );
  }
}

const rootElement = document.getElementById("root");
ReactDOM.render(<App />, rootElement);
#main{ max-width:1170px;  margin: 0 auto;}
.bg_block{ width:100%; border-bottom: 1px solid #666; padding: 20px; box-sizing: border-box;}