我创建了一个正则表达式来搜索标签,如下所示:
<a href=\".+\" rel=\"nofollow\"><strong>دانلود</strong></a>
但结果我只得到一个包含其他HTML标签的庞大结果。
我的HTML是:
<div class="download-51803-links">
<h3>لینک های دانلود</h3>
<span class="instruction-expander">راهنمای دانلود</span>
<script type="text/javascript">
link=('report/' + 'pop-up.php')
document.write('<a class="dbox cboxElement" target="_blank" rel="nofollow" href="http://p30download.com/' + link + '?report-id=77722&report-bid=18&report-title=دانلود Machine Learning A Z Hands-On Python & R In Data Science آموزش کامل یادگیری ماشین آشنایی با پایتون و آر در علوم داده" style="padding:0px" ><span class="report-link">گزارش خرابی</span></a>')
</script>
<p dir="rtl"><img alt="اطلاعات" class="image-text-top" src="http://p30download.com/template/icons/set3/exclaim.gif" title="اطلاعات"/> <strong>حجم</strong>: 5.06 گیگابایت<br><img alt="دانلود" class="image-text-top" src="http://p30download.com/template/icons/set3/arrow-down.gif" title="دانلود"/> <a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part1.rar" rel="nofollow"><strong>دانلود</strong></a> - بخش اول<br><img alt="دانلود" class="image-text-top" src="http://p30download.com/template/icons/set3/arrow-down.gif" title="دانلود"/> <a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part2.rar" rel="nofollow"><strong>دانلود</strong></a> - بخش دوم<br><img alt="دانلود" class="image-text-top" src="http://p30download.com/template/icons/set3/arrow-down.gif" title="دانلود"/> <a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part3.rar" rel="nofollow"><strong>دانلود</strong></a> - بخش سوم<br><img alt="دانلود" class="image-text-top" src="http://p30download.com/template/icons/set3/arrow-down.gif" title="دانلود"/> <a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part4.rar" rel="nofollow"><strong>دانلود</strong></a> - بخش چهارم<br><img alt="دانلود" class="image-text-top" src="http://p30download.com/template/icons/set3/arrow-down.gif" title="دانلود"/> <a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part5.rar" rel="nofollow"><strong>دانلود</strong></a> - بخش پنجم<br><img alt="دانلود" class="image-text-top" src="http://p30download.com/template/icons/set3/arrow-down.gif" title="دانلود"/> <a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part6.rar" rel="nofollow"><strong>دانلود</strong></a> - بخش ششم</br></br></br></br></br></br></p>
</div>
如何将4个项目提取为a
标记,例如?
<a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part1.rar" rel="nofollow"><strong>دانلود</strong></a>
答案 0 :(得分:0)
这是使用Beautiful Soup的解决方案..
html = """ <div class="download-51803-links">
<h3>لینک های دانلود</h3>
<span class="instruction-expander">راهنمای دانلود</span>
<script type="text/javascript">
link=('report/' + 'pop-up.php')
document.write('<a class="dbox cboxElement" target="_blank" rel="nofollow" href="http://p30download.com/' + link + '?report-id=77722&report-bid=18&report-title=دانلود Machine Learning A Z Hands-On Python & R In Data Science آموزش کامل یادگیری ماشین آشنایی با پایتون و آر در علوم داده" style="padding:0px" ><span class="report-link">گزارش خرابی</span></a>')
</script>
<p dir="rtl"><img alt="اطلاعات" class="image-text-top" src="http://p30download.com/template/icons/set3/exclaim.gif" title="اطلاعات"/> <strong>حجم</strong>: 5.06 گیگابایت<br><img alt="دانلود" class="image-text-top" src="http://p30download.com/template/icons/set3/arrow-down.gif" title="دانلود"/> <a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part1.rar" rel="nofollow"><strong>دانلود</strong></a> - بخش اول<br><img alt="دانلود" class="image-text-top" src="http://p30download.com/template/icons/set3/arrow-down.gif" title="دانلود"/> <a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part2.rar" rel="nofollow"><strong>دانلود</strong></a> - بخش دوم<br><img alt="دانلود" class="image-text-top" src="http://p30download.com/template/icons/set3/arrow-down.gif" title="دانلود"/> <a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part3.rar" rel="nofollow"><strong>دانلود</strong></a> - بخش سوم<br><img alt="دانلود" class="image-text-top" src="http://p30download.com/template/icons/set3/arrow-down.gif" title="دانلود"/> <a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part4.rar" rel="nofollow"><strong>دانلود</strong></a> - بخش چهارم<br><img alt="دانلود" class="image-text-top" src="http://p30download.com/template/icons/set3/arrow-down.gif" title="دانلود"/> <a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part5.rar" rel="nofollow"><strong>دانلود</strong></a> - بخش پنجم<br><img alt="دانلود" class="image-text-top" src="http://p30download.com/template/icons/set3/arrow-down.gif" title="دانلود"/> <a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part6.rar" rel="nofollow"><strong>دانلود</strong></a> - بخش ششم</br></br></br></br></br></br></p>
</div>"""
from bs4 import BeautifulSoup
import requests
import re
import random
import types
soup = BeautifulSoup(html, 'html.parser')
list_links = [] # Create empty list
for a in soup.findAll(href=True): # find links
list_links.append(a) #append to the list
def return_links(list_, num):
""" Takes in a list and returns n amount of items in a list """
links_list = []
for i in range(num):
try:
r = list_.pop(random.randint(0, len(list_)))
links_list.append(r)
except IndexError:
return links_list
return links_list
list_of_links = return_links(list_links, 4)
for i in list_of_links:
print(i)
返回:
<a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part3.rar" rel="nofollow"><strong>دانلود</strong></a>
<a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part5.rar" rel="nofollow"><strong>دانلود</strong></a>
<a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part6.rar" rel="nofollow"><strong>دانلود</strong></a>
<a href="http://cdn.p30download.com/?b=p30dl-tutorial&f=Udemy.Machine.Learning.A.Z..Hands.On.Python.and.R.In.Data.Science.Updated.1.2018_p30download.com.part1.rar" rel="nofollow"><strong>دانلود</strong></a>