我正在使用Python,Selenium,Tweepy和openSSL来查看从推文数据中收集的链接。基本上我的代码所做的是检查推文是否有链接,检查它是否为http / https,如果它是https,它将检查证书是否已过期。这就是那个chunk o fcode:
if rsecure.search(s1) != None:
#driver.get(s1)
cert=ssl.get_server_certificate((s1, 443))
x509 = OpenSSL.crypto.load_certificate(OpenSSL.crypto.FILETYPE_PEM, cert, ssl_version=ssl.PROTOCOL_SSLv23)
if x509.has_expired():
print("Expired Cert")
else:
print( "Good Link")
print(driver.current_url)
一切正常,包括检查我输入的短语,打印坏的http链接等,除了这部分代码。当它到达时,它不打印链接,它打印:[Errno -2]名称或服务未知。我环顾四周,没有多少帮助我解决这个错误。我认为它与openSSL部分有关,我不太了解它。
有什么想法吗? 编辑:它偶尔会打印此错误:使用'idna'编解码器编码失败(UnicodeError:标签太长)
编辑:上面的其他代码提供了更多上下文的部分
import tweepy
import re
from selenium import webdriver
from pyvirtualdisplay import Display
import time
from OpenSSL import SSL
import OpenSSL
import ssl, socket
PYOPENSSL = True
#from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
#binary = FirefoxBinary('/ex50/bin/geckodriver.exe')
display = Display(visible=0, size=(800, 800))
display.start()
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--incognito")
driver = webdriver.Chrome('/usr/local/rvm/gems/ruby-2.4.0/bin/chromedriver', chrome_options=chrome_options)
# context = ssl.create_default_context()
# conn = context.wrap_socket(
# socket.socket(socket.AF_INET),
# server_hostname= hostname)
# ssl_info = conn.getpeercert()
# print(ssl_info)
securer = r'https:\S*'
badr = r'http:\S*'
rsecure = re.compile(securer)
rbad = re.compile(badr)
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
try:
if 'http' in status.text:
if rsecure.search(status.text) != None:
driver.get(rsecure.search(status.text).group())
s1 = driver.current_url
elif rbad.search(status.text) != None:
driver.get(rbad.search(status.text).group())
s1 = driver.current_url