Question

使用Beautifulsoup，我只是使用以下方式抓取页面的HTML： page_html = BeautifulSoup（response.text，'html.parser'）

在给定的HTML代码中多次运行相同的代码会给我不同的结果。每个变体的开头如下：理想的变体：

<!DOCTYPE html>
<html lang="en-AU"><head><meta content="text/html; charset=utf-8" http- 
equiv="content-type"/><link href="https://static.tacdn.com/favicon.ico" 
id="favicon" rel="icon" type="image/x-icon"/><link color="#00a680" 
href="https://static.tacdn.com/img2/icons/ta_square.svg" rel="mask-icon" 
sizes="any"/><meta content="#00a680" name="theme-color"/><meta 
content="telephone=no" name="format-detection"/>

不良的变体：

<!DOCTYPE html>
<html lang="en" xmlns:fb="http://www.facebook.com/2008/fbml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="content-type"/>
<link data-rup="long_lived_global_legacy" 
href="https://static.tacdn.com/css2/long_lived_global_legacy- 
v23937170372a.css" rel="stylesheet" type="text/css"/>
<link href="https://static.tacdn.com/favicon.ico" id="favicon" rel="icon" 
type="image/x-icon"/>

我不知道为什么相同URL上的相同代码会导致两种不同的HTML格式

Beautifulsoup返回两个不同的HTML结果

0 个答案: