在Python Beautiful Soup中抓取特殊字符

时间:2016-06-26 03:06:45

标签: python utf-8 beautifulsoup ascii

如何从下面引用的页面中删除(或编码)特殊字符?

    $('#add').click(function(){
  var u =  "<?php echo $_SESSION["user"];?>";
  var f =  "<?php echo $_GET["u"];?>"; 
   $.ajax({
      type: "POST",

      url: "?u="+u+"&add="+f,
      success: function(){
         alert("success");
      }
   });
});


 <a id='add' class='btn btn-success' href=\"#\">Add as friend</a>

1 个答案:

答案 0 :(得分:-1)

如果可以将Unicode对象转换为ASCII,则只能打印它们。如果无法用ASCII编码,您将收到该错误。您可能希望对其进行显式编码,然后打印生成的汤:

import requests
from bs4 import BeautifulSoup
import re

link = "https://www.sec.gov/Archives/edgar/data/4281/000119312513062916/R2.htm"

request_headers = {"Accept-Language": "en-US,en;q=0.5", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Referer": "http://google.com", "Connection": "keep-alive"}
reuest = requests.get(link, headers=request_headers)
soup = BeautifulSoup(reuest.text,"lxml")
print(soup.encode('utf-8'))