Question

I have written a script to fetch scan results from Qualys to be run each week for the purpose of metrics gathering.

The first part of this script involves fetching a list of references for each of the scans that were run in the past week for further processing.

The problem is that, while this will work perfectly sometimes, other times the script will hang on the c.perform() line. This is manageable when running the script manually as it can just be re-run until it works. However, I am looking to run this as a scheduled task each week without any manual interaction.

Is there a foolproof way that I can detect if a hang has occurred and resend the PyCurl request until it works?

I have tried setting the c.TIMEOUT and c.CONNECTTIMEOUT options but these don't seem to be effective. Also, as no exception is thrown, simply putting it in a try-except block also won't fly.

The function in question is below:

# Retrieve a list of all scans conducted in the past week
# Save this to refs_raw.txt
def getScanRefs(usr, pwd):

    print("getting scan references...")

    with open('refs_raw.txt','wb') as refsraw: 
        today = DT.date.today()
        week_ago = today - DT.timedelta(days=7)
        strtoday = str(today)
        strweek_ago = str(week_ago)

        c = pycurl.Curl()

        c.setopt(c.URL, 'https://qualysapi.qualys.eu/api/2.0/fo/scan/?action=list&launched_after_datetime=' + strweek_ago + '&launched_before_datetime=' + strtoday)
        c.setopt(c.HTTPHEADER, ['X-Requested-With: pycurl', 'Content-Type: text/xml'])
        c.setopt(c.USERPWD, usr + ':' + pwd)
        c.setopt(c.POST, 1)
        c.setopt(c.PROXY, 'companyproxy.net:8080')
        c.setopt(c.CAINFO, certifi.where())
        c.setopt(c.SSL_VERIFYPEER, 0)
        c.setopt(c.SSL_VERIFYHOST, 0)
        c.setopt(c.CONNECTTIMEOUT, 3)
        c.setopt(c.TIMEOUT, 3)

        refsbuffer = BytesIO()
        c.setopt(c.WRITEDATA, refsbuffer)
        c.perform()

        body = refsbuffer.getvalue()
        refsraw.write(body)
        c.close()

    print("Got em!")

Answer 1

我自己修复了这个问题，方法是使用multiprocessing启动一个单独的进程，在单独的进程中启动API调用，如果持续时间超过5秒则终止并重新启动。它不是很漂亮但是跨平台。对于那些寻求更优雅但仅适用于* nix 的解决方案的人来说，请查看the signal library，特别是SIGALRM。

以下代码：

# As this request for scan references sometimes hangs it will be run in a separate thread here
# This will be terminated and relaunched if no response is received within 5 seconds
def performRequest(usr, pwd):
    today = DT.date.today()
    week_ago = today - DT.timedelta(days=7)
    strtoday = str(today)
    strweek_ago = str(week_ago)

    c = pycurl.Curl()

    c.setopt(c.URL, 'https://qualysapi.qualys.eu/api/2.0/fo/scan/?action=list&launched_after_datetime=' + strweek_ago + '&launched_before_datetime=' + strtoday)
    c.setopt(c.HTTPHEADER, ['X-Requested-With: pycurl', 'Content-Type: text/xml'])
    c.setopt(c.USERPWD, usr + ':' + pwd)
    c.setopt(c.POST, 1)
    c.setopt(c.PROXY, 'companyproxy.net:8080')
    c.setopt(c.CAINFO, certifi.where())
    c.setopt(c.SSL_VERIFYPEER, 0)
    c.setopt(c.SSL_VERIFYHOST, 0)

    refsBuffer = BytesIO()
    c.setopt(c.WRITEDATA, refsBuffer)
    c.perform()
    c.close()
    body = refsBuffer.getvalue()
    refsraw = open('refs_raw.txt', 'wb')
    refsraw.write(body)
    refsraw.close()

# Retrieve a list of all scans conducted in the past week
# Save this to refs_raw.txt
def getScanRefs(usr, pwd):

    print("Getting scan references...") 

    # Occasionally the request will hang infinitely. Launch in separate method and retry if no response in 5 seconds
    success = False
    while success != True:
        sendRequest = multiprocessing.Process(target=performRequest, args=(usr, pwd))
        sendRequest.start()

        for seconds in range(5):
            print("...")
            time.sleep(1)

        if sendRequest.is_alive():
            print("Maximum allocated time reached... Resending request")
            sendRequest.terminate()
            del sendRequest
        else:
            success = True

    print("Got em!")

Answer 2

这个问题很旧，但是我会添加这个答案，这可能会对某人有所帮助。

执行“ perform（）”后终止正在运行的curl的唯一方法是使用回调：

1-使用CURLOPT_WRITEFUNCTION：如文档所述：

您的回调应返回实际处理的字节数。如果该金额与传递给回调函数的金额不同，则会向库发出错误状态信号。这将导致传输中止，并且所使用的libcurl函数将返回CURLE_WRITE_ERROR。

此方法的缺点是curl仅在从服务器接收到新数据时才调用write函数，因此在服务器停止发送数据的情况下curl只会一直在服务器端等待并且不会接收到终止信号

2-另一种最好的方法是使用进度回调：

进度回调的好处在于，curl将每秒至少调用一次，即使没有来自服务器的数据，这也将使您有机会返回0作为curl的终止开关

使用选项CURLOPT_XFERINFOFUNCTION，请注意，这比使用文档中引用的CURLOPT_PROGRESSFUNCTION更好：

如果可以的话，我们建议用户改用较新的CURLOPT_XFERINFOFUNCTION。

还需要设置选项CURLOPT_NOPROGRESS

必须将CURLOPT_NOPROGRESS设置为0，才能真正调用此函数。

这是一个示例，向您展示python中的写入和进度函数实现：

# example of using write and progress function to terminate curl
import pycurl

open('mynewfile', 'w') as f  # used to save downloaded data
counter = 0

# define callback functions which will be used by curl
def my_write_func(self, data):
    """write to file"""
    f.write(data)
    counter += len(data)

    # tell curl to abort if the downloaded data exceeded 1024 byte by returning -1 or any number 
    # not equal to len(data) 
    if counter >= 1024:
        return -1

def progress(*data):
    """it receive progress figures from curl"""
    d_size, downloaded, u_size, uploade = data

    # tell curl to abort if the downloaded data exceeded 1024 byte by returning 0 
    if downloaded >= 1024:
        return 0


# initialize curl object and options
c = pycurl.Curl()

# callback options
c.setopt(pycurl.WRITEFUNCTION, my_write_func)

self.c.setopt(pycurl.NOPROGRESS, 0)  # required to use a progress function
self.c.setopt(pycurl.XFERINFOFUNCTION, self.progress) 
# self.c.setopt(pycurl.PROGRESSFUNCTION, self.progress)  # you can use this option but pycurl.XFERINFOFUNCTION is recommended
# put other curl options as required

# executing curl
c.perform()

PyCurl request hangs infinitely on perform

2 个答案: