在Python上使用简单形式的请求

时间:2017-07-25 17:33:59

标签: python web-scraping python-requests

我正在尝试使用python抓取特定法语单词的示例句子,但是我回到python的页面似乎没有任何结果。

我已经检查了搜索框和搜索按钮的元素,并将它们作为参数包含在内。也许我错过了什么?

http://www.online-languages.info/french/examples.php

import requests
from bs4 import BeautifulSoup

word = 'manger'
url='http://www.online-languages.info/french/examples.php'
params ={'word':word,'go':''}

response=requests.post(url, data=params)
soup = BeautifulSoup(response.text, 'html5lib')
print(soup.prettify())

Here's what I'm looking to get:

编辑:这是结果的输出。它似乎可能正在使用javascript。如果是这样的话,有没有人可以使用不同的库?

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html dir="ltr" lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
 <head>
  <title>
   French example sentences :: Online-languages.info
  </title>
  <meta content="text/css" http-equiv="Content-Style-Type"/>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <meta content="Database containing thousands of example sentences. Sentences are important for learning correct use of words." name="Description"/>
  <meta content="French language. French grammar. French vocabulary. Tests. Language certificate. Verbs. French phrases. French pronunciation. E-learning. Conversation." name="Subject"/>
  <meta content="French, French grammar, French dictionary, French vocabulary, French language, tests, French test, exam, fce, verbs, exercise, certificate, course, games" name="keywords"/>
  <link href="../style.css" rel="stylesheet" type="text/css"/>
 </head>
 <body style="background-image:url(./img/bg2.jpg);">
  <div align="center">
   <table bgcolor="white" border="0" cellpadding="6" cellspacing="0" style="-moz-border-radius:20px;" width="1000">
    <tbody>
     <tr>
      <td align="center" colspan="4">
       <table border="0" cellspacing="0" width="100%">
        <tbody>
         <tr>
          <td align="center" width="180">
           <a href="../">
            <img alt="Online-languages.info" border="0" src="img/logo.png"/>
           </a>
          </td>
          <td align="left" style="background: url('img/bg.png'); -moz-border-radius:20px; padding: 20px 20px 20px 20px; ">
           <h1 style="color:#fff; font-size:20pt;">
            French words in example sentences
           </h1>
           <h3 style="color:#fff; font-size:8pt; font-weight:normal;">
            French language resources at
            <a href="http://www.online-languages.info" style="color:white;">
             Online-languages.info
            </a>
           </h3>
          </td>
         </tr>
        </tbody>
       </table>
      </td>
     </tr>
     <tr>
      <td align="left" valign="top" width="180">
       <table cellpadding="0" cellspacing="0" class="t2" width="180">
        <tbody>
         <tr>
          <td>
           <a class="arect" href="index.php">
            Home
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="grammar.php">
            French grammar
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="phrases.php">
            French phrases
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="vocabulary.php">
            French vocabulary
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="trainer.php">
            Vocabulary trainer
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="picture-dictionary.php">
            Picture dictionary
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="dictionary.php">
            French dictionary
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="flashcards.php">
            Flashcards
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="audio.php">
            Audio
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="video.php">
            Video
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="translator.php">
            French translator
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="tests.php">
            French quizzes
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="examples.php">
            Examples of use
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="pronunciation.php">
            French pronunciation
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="news.php">
            News in French
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="applications.php">
            Language software
           </a>
          </td>
         </tr>
         <tr>
          <td>
           <a class="arect" href="mobile.php">
            Mobile phones
           </a>
          </td>
         </tr>
        </tbody>
       </table>
       <img alt="" border="0" height="0" src="http://whos.amung.us/swidget/fnhahzdo0ncz.gif" style="display:none;" width="0"/>
      </td>
      <td align="left" bgcolor="#ffffff" valign="top" width="90%">
       <script type="text/javascript">
        <!--
google_ad_client = "ca-pub-7058441231119392";
/* online-languages */
google_ad_slot = "3704078504";
google_ad_width = 728;
google_ad_height = 90;
//-->
       </script>
       <script src="http://pagead2.googlesyndication.com/pagead/show_ads.js" type="text/javascript">
       </script>
       <br/>
       <br/>
       <div align="justify">
        <div id="content">
         <iframe frameborder="0" height="650" src="http://www.dicts.info/examples.php?lang=French&amp;disa=1" width="95%">
         </iframe>
        </div>
       </div>
       <!-- cookieconsent2 by Silktide -->
       <script type="text/javascript">
        window.cookieconsent_options = {
learnMore: 'More info',
message: 'This website uses cookies to personalize content and to improve your experience on our website.',
link: 'https://www.google.com/policies/technologies/cookies/',
theme: 'light-bottom'
};
       </script>
       <script src="https://s3.amazonaws.com/cc.silktide.com/cookieconsent.latest.min.js" type="text/javascript">
       </script>
       <noscript>
        &lt;p&gt;We recommend you enable JavaScript to take full advantage of this website.&lt;/p&gt;
       </noscript>
      </td>
     </tr>
    </tbody>
   </table>
   <br/>
   <table width="700">
    <tbody>
     <tr>
      <td align="center">
       <a href="../english">
        <img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&amp;url=http://www.jazyky-online.info/anglictina"/>
        <br/>
        English
       </a>
      </td>
      <td align="center">
       <a href="../german">
        <img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&amp;url=http://www.jazyky-online.info/spanelstina"/>
        <br/>
        German
       </a>
      </td>
      <td align="center">
       <a href="../french">
        <img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&amp;url=http://www.jazyky-online.info/francouzstina"/>
        <br/>
        French
       </a>
      </td>
      <td align="center">
       <a href="../spanish">
        <img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&amp;url=http://www.jazyky-online.info/spanelstina"/>
        <br/>
        Spanish
       </a>
      </td>
      <td align="center">
       <a href="../russian">
        <img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&amp;url=http://www.jazyky-online.info/rustina"/>
        <br/>
        Russian
       </a>
      </td>
      <td align="center">
       <a href="../chinese">
        <img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&amp;url=http://www.jazyky-online.info/cinstina"/>
        <br/>
        Chinese
       </a>
      </td>
     </tr>
    </tbody>
   </table>
   <br/>
   <br/>
   <table cellpadding="10" style="background:url(img/bgfoot.jpg);" width="100%">
    <tbody>
     <tr>
      <td align="center">
       <font color="#0000aa">
        <a href="../licence.html">
         Licence
        </a>
        |
        <a href="../licence.html">
         Terms of use
        </a>
        |
        <a href="../licence.html#disclaimer">
         Disclaimer
        </a>
        |
        <a href="../licence.html#privacy">
         Privacy policy
        </a>
        |
        <a href="http://www.dicts.info/contact.php?s=Online-languages">
         Contact
        </a>
       </font>
       <br/>
       Copyright © 2007-2017, Online-languages.info
      </td>
     </tr>
    </tbody>
   </table>
  </div>
  <script type="text/javascript">
   var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
  </script>
  <script type="text/javascript">
   try {
var pageTracker = _gat._getTracker("UA-8795372-1");
pageTracker._trackPageview();
} catch(err) {}
  </script>
 </body>
</html>

1 个答案:

答案 0 :(得分:1)

这对我有用。请注意,我使用了GET方法和该页面上实际表单中引用的URI。

import requests

word = 'manger'
url ='http://www.dicts.info/examples.php'
headers = {'Referer': 'http://www.dicts.info/examples.php?disa=1&lang2=french&word=bon&go=Search'}
params = {'word':word,'disa':'1','lang2':'french'}

response = requests.get(url, params=params, headers=headers)
print(response.text)

<强>更新

似乎PHP页面会检查以确保随请求一起发送了适当的referer标头。所以添加一个,如上所述(编辑原文)。