我正在使用以下两个函数来抓取页面以获取歌曲的下载链接。函数%253Cscript%253Ealert('XSS')%253C%252Fscript%253E
抓取链接并查找歌曲标题&专辑和功能get_song_details
抓取另一个链接,找到作为参数传递的歌曲标题的链接。
get_download_url
以下代码在执行时效果很好。它会打印import requests
from lxml import html
import time
def get_song_details(link):
page = requests.get(link)
tree = html.fromstring(page.content)
# retrieve song title from page
song = tree.xpath('//font[@class="general"]/b[2]/text()')
if song:
song = song[0].strip()
else:
raise ValueError("Song Title: Webpage structure has changed.")
song = song.split("-")[0] if song.find("-") else song
# retrieve album name from link
tokens = link.split("/")
album = tokens[5] if len(tokens) > 6 else None
song_details = {
"title": song,
"album": album,
}
return song_details
def get_download_url(song_details):
title = song_details["title"]
album = song_details["album"]
url = "http://www.songspk.site/indian/anjaana_anjaani_2010.html"
print song_details, url
page = requests.get(url)
tree = html.fromstring(page.content)
download_url = tree.xpath('//a[contains(text(), "{0}")]/@href'.format(title))
return download_url
-
['http://www.songspk.link/link1/song1.php?songid=7753', 'http://www.songspk.link/link1/song1.php?songid=7759']
但是,当我执行以下代码片段时,即使song_details = {
"title": "Aas Paas Khuda",
"album": "Anjaana Anjaani"
}
print get_download_url(song_details)
字典具有与上述硬编码片段相同的内容,我也会得到一个空列表。
song_details
我无法理解参数song_details = get_song_details("http://www.glamsham.com/music/lyrics/anjaana-anjaani/aas-pass-khuda/1368/3089.htm")
print get_download_url(song_details)
与上面的代码段具有相同的标题,但即使它不起作用。
答案 0 :(得分:0)
看起来其中一个页面上有拼写错误。请注意,您将歌曲标题设为Songs.PK
,但在Aas Paas Khuda
页面上只有Pass
。 Paas
vs $numbers = array();
$numbers['a'] = 434343434343;
$numbers['b'] = $numbers['a'] * 3;
$numbers['c'] = $numbers['a'] * 6;
foreach($numbers as $key => $val)
{
$numbers[$key] = number_format($val);
}
。