如何从函数结果创建新列

时间:2018-11-15 12:47:01

标签: python pandas for-loop if-statement pygsheets

当前正在运行以下脚本,该脚本检查一长串url中的错误。此代码首先在df ['Final_URL']中查找唯一的url,测试每个单独的url并返回该链接url的状态。当我运行以下代码时,我可以在笔记本上获得当前输出,这很好。现在,我想将状态代码(例如200、404,BAD等)推送到df中名为“状态”的新列,以获取所有等于我在代码开头获得的唯一网址的网址。

创建新列df ['Status']的最佳方法是什么,既然我想将其导出到google工作表,您是否知道在使用pygsheets更新单元格时是否保留了文本颜色?

Input code:
#get unique urls and check for errors
URLS = []

for unique_link in df['Final_URL'].unique():
    URLS.append(unique_link)

try:

    GREEN = '\033[92m'
    YELLOW = '\033[93m'
    RED = '\033[91m'
    ENDC = '\033[0m'

    def main():
        while True:
            print ("\nTesting URLs.", time.ctime())
            checkUrls()
            time.sleep(10) #Sleep 10 seconds
            break

    def checkUrls():     
        for url in URLS:
            status = "N/A"
            try:
                #check if regex contains bet3.com
                if re.search(".*bet3\.com.*", url):
                    status = checkUrl(url)
                else:
                    status = "BAD"

            except requests.exceptions.ConnectionError:
                status = "DOWN"

            printStatus(url, status)

            #for x in df['Final_URL']:
            #    if x == url:
            #        df['Status'] = printStatus(status)



    def checkUrl(url):
        r = requests.get(url, timeout=5)
        #print r.status_code
        return str(r.status_code)

    def printStatus(url, status):
        color = GREEN

        if status != "200":
            color=RED

        print (color+status+ENDC+' '+ url)



    #
    # Main app
    #
    if __name__ == '__main__':
        main()

except:

    print('Something went wrong!')



Current output:

200 https://www.bet3.com/dl/~offer
404 http://extra.bet3.com/promotions/en/soccer/soccer-accumulator-bonus
BAD https://extra.betting3.com/features/en/bet-builder
200 https://www.bet3.com/dl/6

1 个答案:

答案 0 :(得分:2)

您可以这样重写函数

def checkUrl(url):
    if re.search(".*bet3\.com.*", url):
        try:
            r = requests.get(url, timeout=5)
        except requests.exceptions.ConnectionError:
            return 'DOWN'
        return str(r.status_code)
    return 'BAD'

然后像这样应用它

df['Status'] = df['Final_URL'].apply(checkUrl)

尽管,user32185注意到,如果有重复的URL,它将两次调用它们。

为避免这种情况,您可以按照user32185的建议进行操作,并按如下所示编写函数:

def checkUrls(urls):
    results = []
    for url in urls:
        if re.search(".*bet3\.com.*", url):
            try:
                r = requests.get(url, timeout=5)
            except requests.exceptions.ConnectionError:
                results.append([url, 'DOWN'])
            results.append([url, str(r.status_code)])
        else:
            results.append([url, 'BAD'])
    return pd.DataFrame(data=results, columns=['Final_URL', 'Status'])

然后像这样使用它:

status_df = checkUrls(df['Final_URL'].unique())
df = df.merge(status_df, how='left', on='Final_URL')