如何正确输出此新闻网站上的所有链接? (以列表形式)
以列表形式输出后,如何随机返回结果(一次3~5个链接)
注意:我需要的代码从第739行开始(几乎可能会改变一点,导致它每天刷新)
div class="abdominis rlby clearmen"
我需要这种事情中的每一个环节
<a href="https://tw.news.appledaily.com/life/realtime/20180308/1310910/>
谢谢!代码如下:
from bs4 import BeautifulSoup
from flask import Flask, request, abort
import requests
import re
import random
import types
target_url = 'http://www.appledaily.com.tw/realtimenews/section/new/'
print('Start parsing appleNews....')
rs = requests.session()
res = rs.get(target_url, verify=False)
soup = BeautifulSoup(res.text, 'html.parser')
#can output all links but with useless information
contents = soup.select("div[class='abdominis rlby clearmen']")[0].find_all('a')
print(contents)
#can output single link but not in list form
#contents = soup.select("div[class='abdominis rlby clearmen']")[0].find('a').get('href')
#print(contents)
答案 0 :(得分:1)
这是一个解决方案,如果它包含在指定的div中,它会将每个链接附加到列表中。
from bs4 import BeautifulSoup
from flask import Flask, request, abort
import requests
import re
import random
import types
target_url = 'http://www.appledaily.com.tw/realtimenews/section/new/'
print('Start parsing appleNews....')
rs = requests.session()
res = rs.get(target_url, verify=False)
soup = BeautifulSoup(res.text, 'html.parser')
list_links = [] # Create empty list
for a in soup.select("div[class='abdominis rlby clearmen']")[0].findAll(href=True): # find links based on div
list_links.append(a['href']) #append to the list
print(a['href']) #Check links
for l in list_links: # print list to screen (2nd check)
print(l)
创建要返回的随机链接。
import random #import random module
random_list = [] #create random list if needed..
random.shuffle(list_links) #random shuffle the list
for i in range(5): # specify range (5 items in this instance)
try:
res = list_links.pop(random.randint(0, len(list_links))) # pop of each item randomly based on the size of the list
print(res) #print to screen..
random)list.append(res) # or append to random_list
except IndexError:
pass
您要求退回的最后一次编辑..
这是一个函数,它返回x个随机链接数量的列表..
def return_random_link(list_, num):
""" Takes in a list and returns a random amount of items """
random.shuffle(list_)
random_list = []
for i in range(num):
try: # try to append to the list
r = list_.pop(random.randint(0, len(list_)))
random_list.append(r)
except IndexError: #except an IndexError (no items
return random_list # Return the list of items
return random_list
random_list = return_random_link(list_links, 5)
for i in random_list:
print(i)
答案 1 :(得分:1)
如果您想要没有其后代的链接标记,您可以清除它们:
for elm in contents:
elm.clear()
我想象一下,我对提取链接更感兴趣:
contents = [a['href'] for a in contents]
要以随机顺序获取结果,请尝试使用random.shuffle()并在需要时从重新排列的列表中获取多个元素。