为什么每个类“ brand_copy”都重复?

时间:2020-08-23 09:15:04

标签: html css python-3.x

我正在抓取一个网站并生成一个html,如下所示。请编辑chromedriver.exe的目录和输出路径。对我来说奇怪的是,“ brand_copy”类是重复的吗?

enter image description here

enter image description here

能否请您详细说明此问题的原因?

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome('C:\\Users\Dung Le\\Downloads\\chromedriver.exe')

head1 = r'''
<link rel="stylesheet" type="text/css" href="all.css">
<script type="text/javascript">_cb='';function ById(id){return document.getElementById(id)};var info={IsApp:0,word:'love',Domain:'thefreedictionary.com',PageLang:'en',canonical:'https://www.thefreedictionary.com/love',tab:1,sp:0,mode:0,flag:3133,isLogin:(/c\d\d[^;]*userTicket=[^;]*/.test(document.cookie)),a:'jj',isExtDomain:false},hp_title='Dictionary, Encyclopedia and Thesaurus - The Free Dictionary',abu='uvmrofqgjsygesplqdlkziovfhqjuchvcc';(function(d,w){var t=/c11.*font=([12]\d)/g.exec(d.cookie);if(t&&t.length>1)d.documentElement.style.fontSize=t[1]+'px';w.bm='';var a=navigator.userAgent,e=a.indexOf('Trident/')>0||a.indexOf('MSIE')>0,f=w.sidebar,t=hp_title,u='http://www.thefreedictionary.com';if(e||f)w.bm='<li><a href="'+(e?'javascript:external.AddFavorite(\''+u+'\',\''+t+'\')':u)+'" title="'+t+'" rel="sidebar">Bookmark</a></li>'})(document,window);
function waiting(id){ById(id).innerHTML='<img id="'+id+'_ld" width="16" height="16" src="//img.tfd.com/m/wait16.gif">';setTimeout(function(){if(ById(id+'_ld'))document.getElementById(id).innerHTML='Problem loading data'},10000)}
</script>
<script src="async.js" async=""></script>
<style type="text/css">
</style>
'''


head2 = r'''
<script>if(window.lib&&window.lib.delayedFunctions)window.lib.delayedExec();window.completed=1;</script>
'''

url = 'https://www.thefreedictionary.com/love'
driver.get(url)

soup = BeautifulSoup(driver.page_source, 'html.parser')

format = open(r'E:\\Crawl TFD\\test.html', 'w+', encoding = 'utf8')
format.write('love' + head1 + '\n' + str(soup.select_one('#MainTxt')) + '\n' +  head2)
format.close()

使用JavaScript file。由于此文件的大小超出了允许的范围,因此我别无选择,只能将其添加为下载链接。

0 个答案:

没有答案