返回超过期望的Xpath(python,urllib,lxml)

时间:2013-01-07 18:49:58

标签: python xpath lxml urllib

我正在尝试从网站上检索第一个下载链接,但我的代码返回的不止于此而且我不确定原因。

这是我的一段代码:

site_search = "http://mp3skull.com/mp3/tubidy.html"
user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0'

class MyOpener(FancyURLopener, object):
    version = user_agent
myopener = MyOpener()
page = myopener.open(site_search)
html = etree.HTML(page.read())

xpath = "//a[@style = 'color:green;'][1]/@href"
filtered_html = html.xpath(xpath)
print(filtered_html)

我的代码返回:

['http://megdadhashem.wapego.ru/files/56727/tubidy_mp3_e2afc5.mp3', 'http://dc357.4shared.com/img/1396413489/41200d37/dlink__2Fdownload_2FOfQCGDtd_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc337.4shared.com/img/1394581159/99fd9e7/dlink__2Fdownload_2FoENbSCE2_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc337.4shared.com/img/1394580769/9e8391f3/dlink__2Fdownload_2Fu4IeKpFK_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc337.4shared.com/img/1386745964/e7f6dcb/dlink__2Fdownload_2F303C_5FUCB_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc337.4shared.com/img/1386568212/616a9b6e/dlink__2Fdownload_2Fcw_5FeT72M_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc540.4shared.com/img/1386000196/b6a127da/dlink__2Fdownload_2FEyTD5P9j_3Ftsid_3D20130107-14410-24515ba/preview.mp3', 'http://dc337.4shared.com/img/1330719927/4f96e0d1/dlink__2Fdownload_2FiYGiVen4_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc242.4shared.com/img/1328992471/f164c1bb/dlink__2Fdownload_2Fsgz8qSBW_3Ftsid_3D00000000-000000-000000/preview.mp3', 'http://dc539.4shared.com/img/1317978255/68f8329d/dlink__2Fdownload_2FSi_5Fka2Pm_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc402.4shared.com/img/1236310800/70345122/dlink__2Fdownload_2FYWU0Aksu_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc150.4shared.com/img/1236293916/681798eb/dlink__2Fdownload_2Fhe_5FMHVoM_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc263.4shared.com/img/1233805806/ab16f2f1/dlink__2Fdownload_2FFp1E7eV8_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc392.4shared.com/img/1194298272/dda6a2b0/dlink__2Fdownload_2Fq1Y3PdRO_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc392.4shared.com/img/1186905892/803a5130/dlink__2Fdownload_2FubH7xctu_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc429.4shared.com/img/1183115738/125793e3/dlink__2Fdownload_2F9Y3zzp-K_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc459.4shared.com/img/1181881278/421221cb/dlink__2Fdownload_2FjstpNTCi_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc453.4shared.com/img/1181881110/18d5b026/dlink__2Fdownload_2F8mmM2BcS_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc120.4shared.com/img/1181875882/25fa514a/dlink__2Fdownload_2F0_5F0UxQuu_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc471.4shared.com/img/1181868760/9121abb8/dlink__2Fdownload_2Fq2ykXJ7Q_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc381.4shared.com/img/1177326344/661ba359/dlink__2Fdownload_2FHztHPN1O_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc201.4shared.com/img/1146076462/de8d83e2/dlink__2Fdownload_2Fqaumhl-G_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc352.4shared.com/img/1142200306/a439f02c/dlink__2Fdownload_2FECiq0Wc8_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc392.4shared.com/img/1137314077/bf9aa3d8/dlink__2Fdownload_2F1ZQOMJ9O_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc362.4shared.com/img/1128611400/34471996/dlink__2Fdownload_2FEF12Czzg_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc196.4shared.com/img/1124868095/a0646612/dlink__2Fdownload_2FruOhPkHz_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc508.4shared.com/img/1124145685/1257f194/dlink__2Fdownload_2FPUqL0qz8_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc337.4shared.com/img/1120296900/3946f5cc/dlink__2Fdownload_2FqsLK3WC9_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc337.4shared.com/img/1091112724/d363d3c4/dlink__2Fdownload_2FylEZuq80_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc394.4shared.com/img/1086814685/542051eb/dlink__2Fdownload_2FiRSSrUEu_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc340.4shared.com/img/1086805965/4423758d/dlink__2Fdownload_2FAXmv12yD_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc397.4shared.com/img/1086804062/6d2abcc4/dlink__2Fdownload_2FIWWI8tmV_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc339.4shared.com/img/1086802960/a99eb9bb/dlink__2Fdownload_2FlxGG5VBU_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc402.4shared.com/img/1086799043/2637e6a9/dlink__2Fdownload_2FSjcCMKQ5_3Ftsid_3D20130107-14410-24515ba/preview.mp3', 'http://dc352.4shared.com/img/1086798986/4d8501c0/dlink__2Fdownload_2Fk1ZHbbCa_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc364.4shared.com/img/1086798016/93968106/dlink__2Fdownload_2FgNBZbBqG_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc253.4shared.com/img/1086794519/4f34e1c4/dlink__2Fdownload_2FBZWIHqC4_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc337.4shared.com/img/1086790487/f7ee8aea/dlink__2Fdownload_2FbvASkRUI_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc337.4shared.com/img/1084754225/3a8f1481/dlink__2Fdownload_2FY2rkufif_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc495.4shared.com/img/1039479528/73f2fa3c/dlink__2Fdownload_2FKWsm3WJ-_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc145.4shared.com/img/975452680/1597c3a2/dlink__2Fdownload_2FQ2VX9l6W_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc252.4shared.com/img/933590669/b1f79b67/dlink__2Fdownload_2F0hbdsF2M_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc120.4shared.com/img/885049589/d1a62f17/dlink__2Fdownload_2FmC_5F1JDXl_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc224.4shared.com/img/884702525/bb0c917b/dlink__2Fdownload_2F46GnfVxK_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc337.4shared.com/img/849169766/fd8d3498/dlink__2Fdownload_2F-hynMHjn_3Ftsid_3D20130107-14410-24515ba/preview.mp3', 'http://dc431.4shared.com/img/844202587/a88f9c21/dlink__2Fdownload_2F85HCohcN_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc184.4shared.com/img/838092829/30bd6ae8/dlink__2Fdownload_2Ffil3BIUA_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc430.4shared.com/img/838091664/324b51b5/dlink__2Fdownload_2FfrzQcwBu_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc441.4shared.com/img/838089810/882d2f3e/dlink__2Fdownload_2FqUMmG5Zl_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc190.4shared.com/img/838088957/cb5b72cb/dlink__2Fdownload_2FiR6VJUSC_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc433.4shared.com/img/838087554/32bca43/dlink__2Fdownload_2Ff3_5Fn7pKY_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc316.4shared.com/img/838086255/c9df8b35/dlink__2Fdownload_2FKMBk8wZI_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc445.4shared.com/img/838084096/55ee8966/dlink__2Fdownload_2FM9Qw6AwI_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc415.4shared.com/img/838082894/9098e62e/dlink__2Fdownload_2F8DGix4I5_3Ftsid_3D20130107-14410-24515ba/preview.mp3', 'http://dc233.4shared.com/img/838081788/99dc7397/dlink__2Fdownload_2Ft-4IKE5C_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc445.4shared.com/img/838081320/6ae1bbf3/dlink__2Fdownload_2FBy95Sp_5FU_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc387.4shared.com/img/838079502/f5b07bd1/dlink__2Fdownload_2FF_5FyXSg9E_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc429.4shared.com/img/842513873/dbab9cf3/dlink__2Fdownload_2FeNLkoppN_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc424.4shared.com/img/827830064/127ba0d9/dlink__2Fdownload_2Fj8emrNnO_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc270.4shared.com/img/822099181/7483e90e/dlink__2Fdownload_2F7xIdA4q6_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc308.4shared.com/img/822092067/d5a08c83/dlink__2Fdownload_2FM26G9oiJ_3Ftsid_3D20130107-14410-24515ba/preview.mp3', 'http://dc198.4shared.com/img/800516614/3d006c3d/dlink__2Fdownload_2Fdz18B2dB_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc388.4shared.com/img/793902768/4eeb6c1d/dlink__2Fdownload_2FnRMBB2bB_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc362.4shared.com/img/788822785/d1e8e98f/dlink__2Fdownload_2Fqs2Ky8y6_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc183.4shared.com/img/788819652/6b419587/dlink__2Fdownload_2FlnWIeFyL_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc391.4shared.com/img/788813387/c7f33dca/dlink__2Fdownload_2FvmPZSPCp_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc198.4shared.com/img/788809769/eb1c5c4b/dlink__2Fdownload_2F0r6tlUex_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc280.4shared.com/img/788804149/2fcd9aa6/dlink__2Fdownload_2FBZSQjBQM_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc280.4shared.com/img/788803303/35275a6b/dlink__2Fdownload_2FsH4BjUMw_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc198.4shared.com/img/781278584/363504d6/dlink__2Fdownload_2Fof2zYynb_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc405.4shared.com/img/717415145/3d6233e1/dlink__2Fdownload_2FkxXODf-m_3Ftsid_3D20130107-14410-24515ba/preview.mp3', 'http://dc376.4shared.com/img/717284773/98545fac/dlink__2Fdownload_2FrreBjY6x_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc397.4shared.com/img/716180972/4a4cac7d/dlink__2Fdownload_2FPLBlw3hR_3Ftsid_3D20130107-14410-24515ba0/preview.mp3', 'http://dc302.4shared.com/img/7074

我知道从我的结果中取出第一个链接并不难,但我很好奇为什么我首先得到这么多链接。

谢谢

1 个答案:

答案 0 :(得分:2)

// something [1]返回所有值,这些值是各自父级的第一个值。 (// something)[1]会返回文档中的第一个东西。

所以你必须使用:

(//a[@style = 'color:green;'])[1]/@href