当前正在运行以下脚本,该脚本检查一长串url中的错误。此代码首先在df ['Final_URL']中查找唯一的url,测试每个单独的url并返回该链接url的状态。当我运行以下代码时,我可以在笔记本上获得当前输出,这很好。现在,我想将状态代码(例如200、404,BAD等)推送到df中名为“状态”的新列,以获取所有等于我在代码开头获得的唯一网址的网址。
创建新列df ['Status']的最佳方法是什么,既然我想将其导出到google工作表,您是否知道在使用pygsheets更新单元格时是否保留了文本颜色?
Input code:
#get unique urls and check for errors
URLS = []
for unique_link in df['Final_URL'].unique():
URLS.append(unique_link)
try:
GREEN = '\033[92m'
YELLOW = '\033[93m'
RED = '\033[91m'
ENDC = '\033[0m'
def main():
while True:
print ("\nTesting URLs.", time.ctime())
checkUrls()
time.sleep(10) #Sleep 10 seconds
break
def checkUrls():
for url in URLS:
status = "N/A"
try:
#check if regex contains bet3.com
if re.search(".*bet3\.com.*", url):
status = checkUrl(url)
else:
status = "BAD"
except requests.exceptions.ConnectionError:
status = "DOWN"
printStatus(url, status)
#for x in df['Final_URL']:
# if x == url:
# df['Status'] = printStatus(status)
def checkUrl(url):
r = requests.get(url, timeout=5)
#print r.status_code
return str(r.status_code)
def printStatus(url, status):
color = GREEN
if status != "200":
color=RED
print (color+status+ENDC+' '+ url)
#
# Main app
#
if __name__ == '__main__':
main()
except:
print('Something went wrong!')
Current output:
200 https://www.bet3.com/dl/~offer
404 http://extra.bet3.com/promotions/en/soccer/soccer-accumulator-bonus
BAD https://extra.betting3.com/features/en/bet-builder
200 https://www.bet3.com/dl/6
答案 0 :(得分:2)
您可以这样重写函数
def checkUrl(url):
if re.search(".*bet3\.com.*", url):
try:
r = requests.get(url, timeout=5)
except requests.exceptions.ConnectionError:
return 'DOWN'
return str(r.status_code)
return 'BAD'
然后像这样应用它
df['Status'] = df['Final_URL'].apply(checkUrl)
尽管,user32185注意到,如果有重复的URL,它将两次调用它们。
为避免这种情况,您可以按照user32185的建议进行操作,并按如下所示编写函数:
def checkUrls(urls):
results = []
for url in urls:
if re.search(".*bet3\.com.*", url):
try:
r = requests.get(url, timeout=5)
except requests.exceptions.ConnectionError:
results.append([url, 'DOWN'])
results.append([url, str(r.status_code)])
else:
results.append([url, 'BAD'])
return pd.DataFrame(data=results, columns=['Final_URL', 'Status'])
然后像这样使用它:
status_df = checkUrls(df['Final_URL'].unique())
df = df.merge(status_df, how='left', on='Final_URL')