我正在编写一个脚本,让我从某个网站上获取“solidfiles.com”链接。我有所有的href链接。但是,我没有使用python只保留solidfiles.com链接。
This is the website I'm trying to fetch links from
这是我目前的剧本: -
import re
import requests
from bs4 import BeautifulSoup
import os
import fileinput
Link = 'https://animetosho.org/view/jacobswaggedup-kill-la-kill-bd-1280x720-mp4-batch.n677876'
q = requests.get(Link)
soup = BeautifulSoup(q.text)
#print soup
subtitles = soup.findAll('div',{'class':'links'})
#print subtitles
with open("Anilinks.txt", "w") as f:
for link in subtitles:
x = link.find_all('a', limit=26)
for a in x:
url = a['href']
f.write(url+'\n')
有了这个,我写了名为“Anilinks.txt”的文本文件中的所有链接。我似乎无法只保留solidfiles链接。任何提示都会很棒。
答案 0 :(得分:2)
那可能会有用(如果你已经有.txt文件):
# Store the links we need in a list
links_to_keep = []
with open("Anilinks.txt", "r") as f:
for line in f.readlines():
if 'solidfiles.com' in line:
links_to_keep.append(line)
# Write all the links in our list to the file
with open("Anilinks.txt", "w") as f:
for link in links_to_keep:
f.write(link)
或者您可以在写入文件之前过滤链接,然后代码的最后部分将如下所示:
with open("Anilinks.txt", "w") as f:
for link in subtitles:
x = link.find_all('a', limit=26)
for a in x:
if 'solidfiles.com' in a['href']:
url = a['href']
f.write(url+'\n')