我能够使用此
从网页收集数据import requests
import lxml.html
import re
url = "http://animesora.com/flying-witch-episode-7-english-subtitle/"
r = requests.get(url)
page = r.content
dom = lxml.html.fromstring(page)
for link in dom.xpath('//div[@class="downloadarea"]//a/@href'):
down = re.findall('https://.*',link)
print (down)
当我尝试收集有关上述代码结果的更多数据时,我遇到了这个错误:
Traceback (most recent call last):
File "/home/sven/PycharmProjects/untitled1/.idea/test4.py", line 21, in <module>
r2 = requests.get(down)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 475, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 590, in send
adapter = self.get_adapter(url=request.url)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 672, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for '['https://link.safelinkconverter.com/review.php?id=aHR0cDovLygqKC5fKC9zTGZYZ0s=&c=1&user=51757']'
这是我正在使用的代码:
for link2 in down:
r2 = requests.get(down)
page2 = r.url
dom2 = lxml.html.fromstring(page2)
for link2 in dom2('//div[@class="button green"]//onclick'):
down2 = re.findall('.*',down2)
print (down2)
答案 0 :(得分:0)
您正在传递整个列表:
for link2 in down:
r2 = requests.get(down)
请注意您是如何传入down
,而不是 link2
的。 down
是一个列表,而不是一个URL字符串。
传递link2
:
for link2 in down:
r2 = requests.get(link2)
我不确定你为什么要使用正则表达式。在循环中
for link in dom.xpath('//div[@class="downloadarea"]//a/@href'):
每个link
已经一个完全限定的网址:
>>> for link in dom.xpath('//div[@class="downloadarea"]//a/@href'):
... print link
...
https://link.safelinkconverter.com/review.php?id=aHR0cDovLygqKC5fKC9FZEk2Qg==&c=1&user=51757
https://link.safelinkconverter.com/review.php?id=aHR0cDovLygqKC5fKC95Tmg2Qg==&c=1&user=51757
https://link.safelinkconverter.com/review.php?id=aHR0cDovLygqKC5fKC93dFBmVFg=&c=1&user=51757
https://link.safelinkconverter.com/review.php?id=aHR0cDovLygqKC5fKC9zTGZYZ0s=&c=1&user=51757
您无需对此进行任何进一步处理。
您剩下的代码有更多错误;您将r2.url
与r2.content
混淆了,忘记了.xpath
查询中的dom2.xpath(...)
部分。