Question

所以我有一个python脚本从链接列表中删除特定链接，问题是，每当我尝试通过脚本加载链接时，特定链接不可见，但如果我尝试通过浏览器打开链接（具有子链接），则子链接打开。

例如： -

<a href="http://daclips.in/qx9ecuy1geum" class="push_button blue" style="width:290px; height:70px; font-weight:normal; font-size:22px; line-height:65px; margin:0px auto 20px auto;">Click Here to Play</a>

我正在尝试从href（http://onwatchseries.to/cale.html?r=aHR0cDovL2RhY2xpcHMuaW4vNzhzNmE4M3Zra2Y2），＆amp;中提取链接。链接在浏览器中加载，但..如果我尝试通过脚本打开相同的链接，我得到。

document.write('<a href="' + decoded + '" class="push_button blue" style="width:290px; height:70px; font-weight:normal; font-size:22px; line-height:65px; margin:0px auto 20px auto;">Click Here to Play</a>');

如何解决这个问题？

以下是我的剧本。

for i in range(1, 25):
    dicts.setdefault(str(i), [])
    url = "http://onwatchseries.to/episode/seinfeld_s4_e"+str(i)+".html"
    content = urllib2.urlopen(url).read()
    soup = BeautifulSoup(content,"lxml")
    for link in soup.find_all('a',{'title':'daclips.in'}):
        list.append(link.get('href'))
        dicts[str(i)].append(link.get('href'))


for k in list:
    c = urllib2.urlopen(k).read()
    s = BeautifulSoup(c,"lxml")
    for m in s.findAll('a', attrs={'href': re.compile("^http://daclips.in/")}):
        print m.get('href')

在这个脚本中，没有输出，我试过睡了10秒，但仍然无济于事。

Answer 1

正如其中一条评论指出的那样，您可能需要使用selenium来浏览您在浏览器中查看的页面。 Selenium +一个webdriver（PhantomJS，Chromedriver，Firefox）将允许访问页面，就像使用浏览器一样。如果您不希望打开浏览器窗口，那么您最好的选择就是PhantomJS。

from bs4 import BeautifulSoup
from selenium import Webdriver
from time import sleep

url = 'your URL'
browser = webdriver.PhantomJS('path to webdriver')
browser.get(url)
sleep(5)
***your find_element code***

此外，您需要睡眠才能让页面加载（或使用WebDriverWait（））

从Beautifulsoup中的python脚本中提取时，链接不可见

1 个答案: