我有这段代码:
import urllib
from bs4 import BeautifulSoup
import time
url = "http://www.downloadcrew.com/article/31121-magix_movie_edit_pro_2014_premium"
pageUrl = urllib.urlopen(url)
time.sleep(2)
soup = BeautifulSoup(pageUrl)
for a in soup.select("div.downloadLink a[href]"):
print "downloadlink: "+a["href"]
for b in soup.select("h1#articleTitle"):
print b
for c in soup.select("table.detailsTable"):
print c
我想要的是应用程序名称,更新日期,开发人员和下载链接。 当我尝试运行它时,输出将是每个标记内的所有内容。
答案 0 :(得分:1)
以下代码可以获得您想要的内容:
import urllib
from bs4 import BeautifulSoup
import time
url = "http://www.downloadcrew.com/article/31121-magix_movie_edit_pro_2014_premium"
pageUrl = urllib.urlopen(url)
time.sleep(2)
soup = BeautifulSoup(pageUrl)
for a in soup.select("div.downloadLink a[href]"):
print "downloadlink: " + "?" + a["href"].split("?")[1].split(",")[0]
for b in soup.select("h1#articleTitle"):
print b.contents[0].strip()
for c in soup.findAll("th"):
if c.text == "Date Updated:":
print c.parent.td.text
elif c.text == "Developer:":
print c.parent.td.text
但是你无法下载带有该URL的文件。您需要检查JavaScript源文件,以查看javascript:checkDownload()
获取实际文件位置的作用。